General data load best practices

The following best practices are recommended when using the Data Load utility to load data.


Configuration for the initial loads

For more information about recommended configurations during initial loads, see Scenario: Initial load.


Configuration for the delta loads

For more information about recommended configuration during delta loads, see Scenario: Delta load.


Running the data load script file

If you run the Data Load utility from the Utilities container, consider mounting a directory of the host system in a directory within the container. Place all your Data Load files in a subdirectory of the mounted directory. This approach ensures that the files are not lost when the container is overwritten by another Docker image. It can also be easier to edit files directly in the host system using your favorite editor. When running the utility , consider the following command-line options:


File difference preprocessing

We can run a file difference preprocess for routine data loads to improve the Data Load utility performance for loading these files. By using this preprocessor that we can compare two input files, such as a previously loaded file and a new version of this file. The preprocessor generates a difference file that contains only the records in the new file that are not within the old file or that are changed from the records in the old file. The Data Load utility can then load this difference file. If your routinely loaded files contain many previous loaded records, then running this file difference can result in shorter load times. Running a file difference can reduce the loading time required to load your routine updates to the WebSphere Commerce database, reduce server usage time, and improve server performance.

We can configure the Data Load utility file difference preprocessor to compare files by the values in each column, instead of entire records, to identify the changed records. We can also configure the file difference preprocessor to ignore specific columns when the process is comparing files.

For more information about this preprocessor, see Data Load file difference preprocessing.


Data Load utility configuration files

There are three types of data load configuration files:

Keep all your data load configuration files relative to the wc-dataload.XML file. Ensure that the configuration files specified in the wc-dataload.XML file use the relative path. This path can make it easy to move the configuration files from one workstation to another.


Configure the data load order file (wc-dataload.xml)

Consider the following configurations:


Configure the data load environment configuration file (wc-dataload-env.xml)

Consider the following configurations:


Configure the data load business object configuration file

Consider the following configurations:


CSV input files

Consider the following tips when we are editing or maintaining your CSV files:


Loading by unique ID

Specifying the unique ID is optional when we are using the Data Load utility. However, if you specify the unique ID, you save the processing time required to resolve the ID, and performance is improved.


Reversing a data load

To reverse a load, we can run the same data load again with dataLoadMode="Delete" specified in the <_config:LoadOrder> element in your wc-dataload.XML data load configuration file. If we are reversing a load, specify the following configuration row within the <_config:LoadOrder> element:

This configuration row ensures that the Data Load utility continues the process upon a soft delete error. This continuation is because dependent child records no longer exist because of cascade delete.


Tuning the Data Load utility

To reduce any performance impact from running the Data Load utility, we can adjust the Idresolver cache size and parameters related to the utility. See Data Load utility performance tuning.