Index Load configuration files for indexing from CSV files
We can load index information from a CSV file. Index Load requires configuration files before it can be run from a web browser.
Loading the index from a CSV file
Follow these steps to load index information from a CSV file.
- Edit the wc-dataload-profile.xml configuration file, and add the CSV file location, and the target core name.
- Specify CSVReader as the reader, and SolrIndexLoadMapObjectBuilder as the business object builder in wc-businessObject-profile.xml.
If we are using a CSV file to load index data, Index Load requires three configuration files. These files are based on the XML schema definitions of the Data Load framework:
Index Load configuration file Schema definition file Environment configuration file (wc-indexload-env.xml) wc-dataload-env.xsd Profile configuration file (wc-indexload-profileName.xml) wc-indexload.xsd Profile item configuration file (wc-indexload-businessobject.xml) wc-indexload-item.xsd
Environment configuration file (wc-indexload-env.xml)
The wc-indexload-env.xml file contains environment control information and global properties required by Index Load, including a common data writer and data source to be used to persist the data.
The wc-indexload-env.xml file does not typically require customization. We can use the default sample file as-is.
Profile configuration file (wc-indexload-profileName.xml)
The wc-indexload-profileName.xml file contains configurable performance attributes and load item configurations.
Profile names that we define in configuration files are then substituted in as a URL parameter when you call Index Load in a web browser.
The load item configurations are listed under the load order section of this file. They are processed in the same order as they are specified.
It can contain one or multiple LoadItem definitions, with every LoadItem configuration specifying the specific LoadItem configuration and coreName target. Multiple LoadItems are run in parallel, without sequence. Example: wc-indexload-price.xml
<_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-price-sql.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" /> <_config:property name="groupName" value="1" /> </_config:LoadItem>The following configurable performance attributes apply to profile configuration files:
- batchSize
- The threshold when documents are soft committed in memory.
- Default is 1. If a value of 0 is specified, it does not commit until the load item finishes.
- commitCount
- The threshold when documents are hard committed to disk from memory.
- We can use a commitCount of 0 if we use a memory-based commit. See Tuning Index Load.
- ThreadLaunchTimeDelay
- The amount of time in milliseconds to wait before starting another new thread to avoid overloading the system at startup.
- Default is 1000.
- OptimizeAfterIndexing
- Indicates whether Index Load performs index optimization after commit.
Note: Performing optimization after a full indexing improves runtime performance; however, it increases the overall indexing time.
- StatusRefreshInterval
- The maximum amount of time in seconds to wait before refreshing the current Index Load status and display it in the administrative log.
- Default is 300. Use a value of -1 to disable the service.
- DocumentSizeSamplingInterval
- The time interval in seconds to calculate the size of the indexed document. Use -1 to disable the service. Default is 300.
- IndexHeightCacheHint
- A number that hints the system to determine the size of the applicable caches for index height used during indexing.
- IndexWidthCacheHint
- A number that hints the system to determine the size of the applicable caches for index width used during indexing.
Profile item configuration file (wc-indexload-external-price.xml)
<_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-external-price.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" /> <_config:DataSourceLocation location="C:\Patches\delta.csv" /> </_config:LoadItem>Where
- coreName
- The name of the extension core name.
- DataSourceLocation
- The location to the CSV data file.
Sample configuration files
Download and extract the following sample code: IndexLoadSampleCode.zip. The sample includes configuration files used by Index Load, and manual updates that are performed in the Indexing contract prices using Index Load task, for reference.