Data Load file difference preprocessing

We can run a file difference preprocess for routine data loads to improve the Data Load utility performance for loading these files. Running a file difference can reduce the loading time required to load your routine updates to the WebSphere Commerce database, reduce server usage time, and improve server performance.

The file difference preprocessor is available only for the Data Load utility. By using this preprocessor that we can compare two input files, such as a previously loaded file and the newest version of this file. The preprocessor generates a difference file that contains only the records in the new file that are not within the old file or that are changed from the records in the old file. The Data Load utility can then load this difference file. If your routinely loaded files contain many previous loaded records, then running this file difference can result in shorter load times. This preprocess can be scaled to compare files with millions of records.

The file difference preprocessor is not a general-purpose file difference tool. If the contents of the old file we are comparing exists in the WebSphere Commerce database, loading the generated difference file into the database is the equivalent of loading the entire new file. If the generated difference file is smaller than your new file, loading the difference file can reduce the overall loading time required to update the database to match the contents of your new file.

We can also use the file difference preprocessor as a separate process from the actual loading of data into the database. We can use this preprocess to generate a difference file but not load the file. By pausing this preprocess before the file loads, the preprocessor does not affect the database or the WebSphere Commerce system performance. We can load the difference file later. Load the difference file with the Data Load utility when the loading process has the least impact on the database and WebSphere Commerce system performance.

The file difference is implemented as a data reader preprocessor. It runs at the beginning of the data reader initialization when you run the Data Load utility. By default, there are two data reader preprocessors provided for running a file difference; one for comparing CSV files (CSVFileDiffPreprocessor) and one for XML files (XmlFileDiffPreprocessor). The data reader preprocessor is specified as a DataReaderPreprocessor subelement within the DataReader element of the Data Load business object configuration file. For example:

You do not need to explicitly specify this preprocessor to run a file difference. To run the file difference, we must specify only the key column property values to uniquely identify records in your input files. We must also specify the file location for the older file we want to compare. If you include these two required file difference properties in your configuration files, the file difference preprocessor automatically runs when you run the Data Load utility. For more information about configuring the Data Load utility to run a file difference, see Configure the Data Load utility to run a file difference preprocess.


Best Practices

When you run the file difference preprocessor, ensure that you consider the following tips and recommendations:


Limitations

When you run the file difference preprocessor to help improve the data load performance, we must understand the file difference behavior and limitations: