Administer > Transforming, loading, and extracting data > Overview of the data load utility
Data load best practices
- Configuration for the initial loads
- Configuration for the delta loads
- Run the data load script file
- Data load configuration files
- Configure wc-dataload.xml
- Configure wc-dataload-env.xml
- Configure data load business object configuration
- CSV files
- Load by unique ID
- Load data into a workspace
Configuration for the initial loads
See Initial load scenario for more information about recommended configurations during initial loads.
Configuration for the delta loads
See Delta load scenario for more information about recommended configuration during delta loads.
Run the data load script
Optionally turn off XML validation if you are using some variable substitution for integer attributes...
-DXmlValidation=falseTurn on more tracing in...
WC_INSTALL/logs/wc-dataload.log...by specifying the following option to turn on the FINEST tracing for all packages:
-D.level=FINESTFor large loads, specifying FINEST trace level, causes too much tracing in the log file; you can turn on tracing for one package. If there is an SQL error, turn on the database-related trace specifying the FINE trace level:
-Dcom.ibm.commerce.foundation.dataload.database.level=FINECustomize the Java logging configuration file...
WC_INSTALL/wc.ear/xml/config/dataload/logging.propertiesYou can...
- change the log file path
- change the maximum log file size
- how many log files to cycle through
By default, you only have 1 log file, and the log file is overwritten every time you run the data load utility.
Data load configuration files
There are three types of data load configuration files:
Load order configuration file wc-dataload.xml You can have multiple load order configuration files, or one file with all items. To load just a few load items, run the data load utility with options...
-DLoadOrder="loadItemName1, loadItemName2, loadItemName3"Environment configuration file wc-dataload-env.xml You only need one copy of this configuration file. Business object configuration files wc-loader-<business object>.xml One business object configuration file generally corresponds to one type of input data, which loads one type of business object. Defines...
- DataReader
- BusinessObjectBuilder
- BusinessObjectMediator
- Customization configuration options
...used for the data load.
Keep all the data load configuration files relative to wc-dataload.xml.
Ensure that the configuration files specified in wc-dataload.xml use relative paths to make it easy to move the configuration files from one workstation to another.
Configure wc-dataload.xml
Specify commitCount, batchSize, dataLoaderMode at the LoadOrder level, so you do not need to specify them at each LoadItem level.
Specify the commitCount to be greater than or equal to the batchSize. The commitCount is a multiple of the batchSize.
To minimize the impact to the production server, specify the commitCount and batchSize to 1. Specifying a large commitCount and batchSize improves the data load performance, but might have impact on the database; that is more database tables and rows are locked for longer time.
For easy debugging of some SQL errors, specify the batchSize to 1 and turn on the database tracing. This allows you to find out which SQL statement or input line caused the SQL error. If the batchSize is greater than 1, the JDBC batch update is enabled; it is hard to relate the SQL error to the input line or SQL statement that caused the error.
Configure wc-dataload-env.xml
Consider the following configurations:
- Do not specify the <_config:FilePath /> element if all the configuration files are relative to wc-dataload.xml.
- Encrypt the database password in the configuration file.
You can run the wcs-encrypt utility in...
WC_INSTALL/bin...to find out the encrypted ASCII string...
wcs-encryptsh <plain password>
- You can also leave the database password setting empty.
When the password is empty, you are prompted to enter the password when you run the data load utility. You must enter the plain unencrypted password; the password is not echoed on the console for security purposes.
- The IDResolver configuration is optional.
You can specify the ID resolver cache size. For initial load, specify a large cache size, such as 1 million. For delta load, specify a cache size of 0. If you do not specify the ID resolver configuration, the default cache size is 0.
- Instead of hard coding the data writer in each of the data load business object configuration file, specify the default data writer class in this configuration file.
Configure the data load business object configuration
Consider the following configurations:
- Instead of hard coding the CSV file column names in the <_config:Data> node of the <_config:DataReader> element, you can place all the column names in the first line of the CSV file.
Then specify firstLineIsHeader="true" in the <_config:DataReader> element.
Only applies when you use the CSVReader interface as the data reader.
- When defining the mapping or table configuration in the <_config:BusinessObjectBuilder> element, if the value is from the input data, you do not need to specify the valueFrom attribute. The default is valueFrom="InputData".
CSV files
Consider the following tips when editing or maintaining the CSV files:
- For clarity and readability, use the first line of the CSV file as a header.
<_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.CSVReader" firstLineIsHeader="true" >You can use a spreadsheet software to open the CSV file to see if your data matches the column heading.
- For flexibility of using and omitting optional columns or rearranging the columns, use the header as column name instead of hard coding the columns.
<_config:DataReader className="com.ibm.commerce.foundation.dataload.datareader.CSVReader" firstLineIsHeader="true" useHeaderAsColumnName="true" >
...instead of...
<_config:Data> <_config:column number="1" name="Identifier" /> <_config:column number="2" name="Name" /> <_config:column number="3" name="ShortDescription" /> <_config:column number="4" name="LongDescription" /> <_config:column number="5" name="Thumbnail" /> <_config:column number="6" name="FullImage" /> <_config:column number="7" name="Delete" /> </_config:Data>
- Do not use the spreadsheet software to save the CSV file.
Because the spreadsheet software might reformat the CSV file and invalidate your data for the CSVReader. For example, if you have a timestamp column in the CSV file, the spreadsheet software can reformat the timestamp data according to the locale setting which might not be valid for the data load.
- By default, data load only support the Java standard timestamp format "yyyy-MM-dd hh:mm:ss.nnnnnn".
The data load utility can also support a customized timestamp format. If you have a customized timestamp format in the CSV file, specify it in the load item configuration in wc-dataload.xml. For example, you can have the following property specify in the <_config:LoadItem> element:
<_config:property name="timestampPattern" value="yyyy-MM-dd HH.mm.ss" />
Load by unique ID
Specify the unique ID is optional when using the data load utility. However, if you specify the unique ID, you save the overhead of resolving the ID, and performance is improved.
Load data into a workspace
You are recommended to load the data into a workspace when using the data load utility. Benefits of loading data into the workspace:
- You can preview changes to managed assets without affecting what is currently running on the site. If you find any errors, you can correct them before the data is committed to the production-ready data on the authoring server.
- You can add an approval process. So before any data changes are committed to the production-ready data, it must be approved.
See
- Catalog: data load best practices
- Inventory: data load best practices
- Price: data load best practices
Related concepts
Workspaces
Workspaces locking policies
Related tasks
Substitute attribute values with variables in data load configuration files
Load data into workspaces using the Data load utility
Configure the data load order
Related reference
Examples: Mapping data
Initial load scenario
Delta load scenario
Data Load utility