Configure the data load order
The data load order configuration file controls the load order of the Data Load utility. The data load file has a pointer to the environment settings file, business object configuration file, and input file. We can also define the mode that the Data Load utility uses to load data.
Task info
Sample load order configuration files are provided for loading different component objects. These files are included within the component-specific directories in the following directory:
- (Linux) utilities_root/samples/DataLoad
(Developer) WCDE_installdir/samples/DataLoad
Procedure
- Create a copy of a sample wc-dataload-object.xml file and rename the file. For example, copy the wc-dataload-catalog-entry.xml file within the Catalog component directory.
- Open the new data load order configuration file for editing.
- Specify the data load environment configuration file.
- Find the <_config:DataLoadEnvironment> element.
- Change the value for the configFile attribute of this element to be the relative file path to the data load environment configuration file.
Note: This value is overridden if an environment configuration file is specified in the command line when running the Data Load utility from the Utility server Docker container.
Although the environment configuration file specified in the command line takes precedence, the element must still exist in the data load environment configuration file.
- Configure the load order for the data load process.
- Find the <_config:LoadOrder> element.
- Optional: Specify the value for the commitCount attribute. The commit count identifies the number of rows that are flushed in a single transaction before a commit is issued. The following values can be set for this attribute.
- 0
- Nothing is committed until the load item finishes processing all of its input data. If an exception occurs, the records that are successfully processed can be committed. If the exception is an SQL exception, the utility rolls back all of the data that is processed after the last committed record. If a different type of exception occurs, the utility rolls back only the data that is not successfully processed. The load operation commits any successfully processed records.
- N
- Where N is a positive integer value. The commit count specifies how many lines of records that are processed when it calls the database commit. Default is 1.
Note: If errors occur during the data load, an SQL exception causes the database to roll back to the last line committed. The data after the last line committed up until the error line does not commit to the database. Any other error does not cause a rollback. For other errors, only the error line data does not commit to the database. All other data is committed. For example, if the commitCount is N = 10 and an error occurs in line 25:
- If the error is an SQL exception, the data between line 1 - 20 is committed to the database. The data between line 21 - 25 is not added in the database.
- If the error is not an SQL exception, the data between line 1 - 24 is committed to the database. The data in line 25 is not added in the database.
We can include a configuration property "rollbackForAnyException" to control what data the utility rolls back when an exception occurs. When the property is set to true, the utility rolls back all data that is processed after the last committed record whenever an exception occurs. This roll back occurs regardless of the exception type. We can set this property within the configuration for each load item or within the parent <_config:LoadOrder> element to have the property setting apply to all load items.
- Optional: Specify the value for the batchSize attribute. Database transactions are loaded in batches. These batches are kept in the Java memory until there are enough stored for a flush. Then, the batch contents are stored to the database as a single packet of data. The batches are stored in the database until the next commit occurs and the database loads the changes. The following values can be set for this attribute.
- 0
- Uses JDBC batch update. All batches, for the entire load item, are processed for input data.
- N
- Where N is a positive integer value, indicating how many lines of data are processed before the JDBC batch is executed. The JDBC batch is enabled if, and only if, the batch size is greater than 1. The batchSize value should be less than or equal to the commitCount value. The default batch size is 1, which means that the JDBC batch is not enabled and the SQL statements are executed one by one directly.
- Optional: Specify the value for the dataLoadMode attribute. The data load mode can be set to either Insert, Update, or Delete.
- Insert
- All data is inserted into the database. The utility generates insert SQL statements. This mode is recommended for initial loading of data. If there are any delete flags in your data file, the flags are ignored. In insert mode, we can specify a primary keyrange to use when the object does not exist in the database and it requires a new generated key. Specify the value within the <_config:BusinessObjectMediator> element. For example:
startKey="100001" endKey="200001"The data writers that are supported in the insert mode:
- JDBC data writer
- (DB2) (Oracle) Native load data writer
Note: Running the Data Load utility in insert mode can improve the performance of loading initial loads of large amounts of data. However, when you run the Data Load utility in insert mode the utility does not check the database before the utility loads your input data. The utility does not determine whether the data objects that we are creating exist in the database before the utility attempts to load and create the data objects. This behavior can cause the load process to fail when the utility attempts to create data objects that exist in the database. For example, if your input file is a CSV file that contains a line that creates a product and a product description. If the file also contains a second line that loads the same product with a description in a different language, the load process fails. Since the utility creates the product and description in the first line, when the utility encounters the second line, the utility attempts to create the product again. Since the product exists, the load fails. To load this product and description data, we can use one of the following methods:
- Run the Data Load utility in replace mode.
- Run the Data Load utility in insert mode, but load your product and description information separately. Use one input file to load your product information, and a different input file to load all of your descriptions.
For more information about configuring the Data Load utility to run in insert mode, see Scenario: Initial load.
- Replace
- Default: All data is replaced in the database. The utility generates insert, update, or delete SQL statements based on the data. Replace mode replaces existing data that is contained in the database with the input data. That is, if some column information is not in the input data, the column value is updated to null or the default value if any. For example:
- If one record (line) in your data file represents a new object, it is inserted.
- If the object is in the database already, it is replaced.
- If there is a flag in the data to indicate that this object is to be deleted, the object is deleted.
In replace mode, do not specify a primary keyrange value as it can result in key conflicts within the database. The data writers that are supported in the replace mode:
- JDBC data writer
- (DB2) Native load data writer
Note: We can prevent accidentally replacing information in the database with null data. Modify the original input data used in the initial load when we are replacing a subset of the original data. Do not enter empty fields in your source file, unless we want the fields to contain null data in the database.
For more information about configuring the Data Load utility to run in replace mode, see Scenario: Delta load.
- Update
- Specified catalog entry and catalog entry description data is updated in the database. The utility compares the catalog entry data in the input file with the corresponding data for the catalog entries in the database. The utility then replaces or adds the new or changed data for only the columns specified in the input file. All other columns remain unchanged.
The update mode supports only the JDBC data writer.
Note: The Data Load utility can run in update mode for loading only catalog entry or catalog entry description information.
For more information about configuring the Data Load utility to run in update mode, see Scenario: Catalog entry update load.
- Delete
- All data that is identified in the input file that is in the database is deleted from the database. The utility ignores the value in the delete column or element in an input file in this mode. The utility generates the delete SQL statements that are needed to delete the objects in the input file. When you run the utility in this mode, only the JDBC data writer is supported in this mode.
Note: If the site uses WebSphere Commerce search, the delta search index might not rebuild correctly when you delete some catalog objects with the Data Load utility in delete mode. When you delete a child object of a catalog entry or category with the utility in delete mode, both the child and parent objects are removed from the delta search index rebuild. This removal can cause the parent catalog entry or category to no longer be indexed or display correctly in the storefront.
Use the utility in replace mode to delete catalog objects when the site uses WebSphere Commerce search. To delete objects with the utility in replace mode, include the value 1 for the Delete column of an object in your input file. If you do decide to delete catalog objects with the utility in delete mode, run a full index rebuild after the load operation completes.
- Optional: Specify the value for the maxError attribute. The maximum error tolerance can be set to ensure that the Data Load utility runs, regardless of any errors that occur.
If we are using using the XML data reader class to load XML input files, do not set the maxError attribute. If you do set this attribute, we must set the value for the attribute to be 1. If you set a different value, we can encounter unexpected behavior. If we are using a CSV data reader class, the following values can be set for this attribute.
- 0
- Continue to run the Data Load utility, regardless of how many errors occur.
- N
- Where N is a positive integer value. The max error count specifies the error tolerance level during the data load process for a load item. Default is 1.
- Specify the load item configuration elements for each load item required. Specify the load item information within the <_config:LoadItem> element. Within this element, we must include the name attribute to identify the name of the load item. We must also include the businessObjectConfigurationFile attribute to identify the business object configuration file for the load item. For example,
<_config:LoadItem name="CatalogGroup" businessObjectConfigFile= "wc-loader-catalog-group.xml" >
- Within the <_config:LoadItem> element, specify the data source location for the input file containing the load item information. Specify this information with the location attribute in the <_config:DataSourceLocation> element. For example,
<_config:DataSourceLocation location="CatalogGroups.xml"/>
- Optional: Include any other configuration properties that you require. Include these properties within a <_config:property> element within either the <_config:LoadOrder> or <_config:Loaditem> elements. If this property tag is specified within a <_config:LoadOrder> element, the property applies to all load items. If the property tag is specified within a <_config:Loaditem> element, then the property applies to only that specific load item. For example, to build the search index, include the buildSearchIndex configuration property in the load order configuration file.
For more information about the available configuration properties, see Data Load utility configuration properties.
- Save and close the file. The following sample shows a configuration file for catalog data.
<?xml version="1.0" encoding="UTF-8" ?> <_config:DataLoadConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../../../xml/config/xsd/wc-dataload.xsd" xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config"> <_config:DataLoadEnvironment configFile="wc-dataload-env.xml" /> <_config:LoadOrder commitCount="100" batchSize="1" dataLoadMode="Replace"> <_config:property name="firstTwoLinesAreHeader" value="true" /> <_config:LoadItem name="CatalogGroup" businessObjectConfigFile="wc-loader-catalog-group.xml"> <_config:property name="validateCatalogGroup" value="true" /> <_config:DataSourceLocation location="CatalogGroups.csv" /> </_config:LoadItem> <_config:LoadItem name="CatalogEntry" businessObjectConfigFile="wc-loader-catalog-entry.xml"> <_config:DataSourceLocation location="CatalogEntries.csv" /> </_config:LoadItem> </_config:LoadOrder> </_config:DataLoadConfiguration>
- Optional: If we are configuring the data load order for routine data load processes, we can configure a file difference preprocessor to run.
We can run a file difference preprocess for routine data loads to improve the Data Load utility performance for loading these files. By using this preprocessor that we can compare two input files, such as a previously loaded file and a new version of this file. The preprocessor generates a difference file that contains only the records in the new file that are not within the old file or that are changed from the records in the old file. The Data Load utility can then load this difference file. If your routinely loaded files contain many previous loaded records, then running this file difference can result in shorter load times. Running a file difference can reduce the loading time required to load your routine updates to the WebSphere Commerce database, reduce server usage time, and improve server performance.
We can configure the Data Load utility file difference preprocessor to compare files by the values in each column, instead of entire records, to identify the changed records. We can also configure the file difference preprocessor to ignore specific columns when the process is comparing files.
For more information about this preprocessor, see Data Load file difference preprocessing.
- Save and close the file.
- Scenario: Initial load
After a WebSphere Commerce instance is created and configured, we can load the initial catalog, price, and inventory data into the WebSphere Commerce database.- Scenario: Delta load
When your Transaction Server is running, we can load minor changes in data into the database. The delta load process checks whether data exists in the database, and then either adds, replaces, or removes the data where necessary.- Scenario: Catalog entry update load
We can configure the Data Load utility to run in an update mode to update catalog entry information. The update mode compares the catalog entry data in the input file with the corresponding data for the catalog entries in the database. The update mode then replaces or adds data for only the columns specified in the input file. All other columns remain unchanged.
Previous topic: Configure the business object configuration file
Next topic: Configure a column exclusion list