Scenario: Delta load
When your Transaction Server is running, we can load minor changes in data into the database. The delta load process checks whether data exists in the database, and then either adds, replaces, or removes the data where necessary.When you run a delta load, the Data Load utility does compare the file that we are loading with the data in the database. If the utility finds existing data for an object in the database, the utility replaces that data with the data included in the input file. If the input file includes data for only specific columns, the load operation still replaces the values for all columns for the object in the database table. The utility replaces the data in the database with the data in the input file and replaces all other column values for the object with null. If you do not want the utility to replace data for a column when you run a delta load, configure the utility to exclude the column from being updated during a load operation. See Configure a column exclusion list.
If we are loading changes for catalog entries or catalog entry descriptions, we can run the Data Load utility in an update mode. This update mode compares the data in the database with the data in the file and updates only the columns that include changes. If no values are set for database columns in the input file, the load operation does not change the value within the database. See Scenario: Catalog entry update load
Configuration recommendations
During a delta load, data can be loaded into a staging environment or a production environment. However, some data such as inventory data must be loaded into a production environment. Ensure that you configure your data load process to reduce the impact to the shopping flow when loading data into a production environment. The following configuration considerations are recommended for a delta load:
- Specify the data load mode as Replace in the wc-dataload.xml data load order configuration file unless we are loading new data into the database. But we can still use the Insert mode to improve the performance. If the data load mode is set to Insert, the data from the input CSV file is directly loaded into the database without checking for existing data. If data exists, a duplicate key exception is thrown.
- Specify a commit count of 1 and batch size of 1 to minimize the impact to our production site. If you specify a large commit count and batch size, more rows are locked in database tables, and the lock time is longer. When the rows are locked, the information might not be available online.
- Do not specify a keyrange with a start key and an end key unless we are sure it would not cause key conflicts.
- Specify the ID resolver cache size to 0 if the database is large. Loading the ID resolver cache for a large table might take a long time and also a significant amount of memory. The ID resolver cache might not be loaded for a large table because of the JVM heap size limitation. So for a large database, the ID resolver cache size is set to 0. Set the ID resolver cache size to a large number only to load data into small tables.
For more information about running the Data Load utility after you set the data load mode for a delta load, see Configure and running the Data Load utility
Sample
The following code snippet is part of a sample wc-dataload.xml data load order configuration file. This snippet specifies the Replace data load mode, a small commit count, and batch size, and without a fixed keyrange:<_config:LoadOrder commitCount="1" batchSize="1" dataLoadMode="Replace" > <_config:LoadItem name="CatalogEntry" businessObjectConfigFile="wc-loader-catalog-entry.xml" > <_config:DataSourceLocation location="CatalogEntry.csv" /> </_config:LoadItem> </_config:LoadOrder>The following code snippet is part of a sample wc-dataload-env.xml data load environment configuration file. This snippet specifies the ID resolver cache size to 0:
<_config:IDResolver className="com.ibm.commerce.foundation.dataload.idresolve.IDResolverImpl" cacheSize="0" />