Index Load configuration files for indexing from database
Index Load requires configuration files before it can be run from a web browser. Index Load requires three types of configuration files, based on the XML schema definitions of the Data Load framework:
Index Load configuration file Schema definition file Environment configuration file (wc-indexload-env.xml) wc-dataload-env.xsd Profile configuration file (wc-indexload-profileName.xml) wc-indexload.xsd Profile item configuration file (wc-indexload-businessobject.xml) wc-indexload-item.xsd
Environment configuration file (wc-indexload-env.xml)
The wc-indexload-env.xml file contains environment control information and global properties required by Index Load, including a common data writer and data source to be used to persist the data.
The wc-indexload-env.xml file does not typically require customization. We can use the default sample file as-is. Example: wc-indexload-env.xml
<_config:DataLoadEnvConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ibm.com/xmlns/prod/commerce/foundation/config ../../../xml/config/xsd/wc-dataload-env.xsd" xmlns:_config="http://www.ibm.com/xmlns/prod/commerce/foundation/config"> <_config:DataSource reference="com.ibm.commerce.foundation.server.services.search.datasource" /> <_config:DataWriter className="com.ibm.commerce.foundation.internal.server.services.indexload.writer.SolrIndexLoadWriter" > <_config:DataLoadBatchService className="com.ibm.commerce.foundation.server.services.indexload.writer.solr.SolrIndexLoadBatchService" /> </_config:DataWriter> </_config:DataLoadEnvConfiguration>
Profile configuration file (wc-indexload-profileName.xml)
The wc-indexload-profileName.xml file contains configurable performance attributes and load item configurations.
Profile names that we define in configuration files are then substituted in as a URL parameter when you call Index Load in a web browser.
The load item configurations are listed under the load order section of this file. They are processed in the same order as they are specified.
It can contain one or multiple LoadItem definitions, with every LoadItem configuration specifying the specific LoadItem configuration and coreName target. Multiple LoadItems are run in parallel, without sequence. Example: wc-indexload-price.xml
<_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-price-sql.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" /> <_config:property name="groupName" value="1" /> </_config:LoadItem>The following configurable performance attributes apply to profile configuration files:
- batchSize
- The threshold when documents are soft committed in memory.
- Default is 1. If a value of 0 is specified, it does not commit until the load item finishes.
- commitCount
- The threshold when documents are hard committed to disk from memory.
- We can use a commitCount of 0 if we use a memory-based commit. See Tuning Index Load.
- ThreadLaunchTimeDelay
- The amount of time in milliseconds to wait before starting another new thread to avoid overloading the system at startup.
- Default is 1000.
- OptimizeAfterIndexing
- Indicates whether Index Load performs index optimization after commit.
Note: Performing optimization after a full indexing improves runtime performance; however, it increases the overall indexing time.
- StatusRefreshInterval
- The maximum amount of time in seconds to wait before refreshing the current Index Load status and display it in the administrative log.
- Default is 300. Use a value of -1 to disable the service.
- DocumentSizeSamplingInterval
- The time interval in seconds to calculate the size of the indexed document. Use -1 to disable the service. Default is 300.
- IndexHeightCacheHint
- A number that hints the system to determine the size of the applicable caches for index height used during indexing.
- IndexWidthCacheHint
- A number that hints the system to determine the size of the applicable caches for index width used during indexing.
Profile item configuration file (wc-indexload-businessobject.xml)
The wc-indexload-businessobject.xml file contains detailed DataLoader configurations, which include the dataload className, DataReader, and BusinessObjectBuilder. The SolrIndexLoadQueryLoader is used to load objects from the database.Example: wc-indexload-price-sql.xml<_config:DataLoader className="com.ibm.commerce.foundation.server.services.indexload.loader.solr.SolrIndexLoadQueryLoader" >The following configurable performance attributes apply to profile item configuration files:
- ParallelThreads
- Reads data in parallel. It specifies the maximum loader thread number, which can be dispatched by the search work manager. The loader thread reads data in parallel, sharing the data writer.
- An empty value or 1 indicates no parallel indexing.
- ParallelLowerRangeSQL
- SQL queries that get the first keys.
- SQL queries can be used to specify that indexLoad only load parts of the objects from the database.
- ParallelUpperRangeSQL
- SQL queries that get the end keys.
- ParallelNextRangeSQL
- An SQL statement that determines the next available identifier when an empty range ID is detected from the parallel range. Typically, the nextStartKey value is the firstKey, and the nextEndKey is the firstKey+prefetchSize-1.
- ParallelLowerRange
- A hardcoded value that tracks the lower range keys. If defined, it is an absolute number for the lower range and overrides the value of ParallelLowerRangeSQL.
- ParallelUpperRange
- A hardcoded value that tracks the upper range keys. If defined, it is an absolute number for the upper range and overrides the value of ParallelUpperRangeSQL.
- ParallelPrefetchSize
- Determines how much data to read in one run, when the reader performs a query from the database. If defined, the run time breaks up the entire data range into fragments to avoid overloading the database sort heap with too large a query result set
- Default is 10000.
- ParallelDeltaUpdate
- Determines whether the SQL result set is merged into an existing indexed document that contains a matching primary key. This delta update operation is equivalent to the Atomic Update feature provided by Solr.
The profile item configuration file contains a data reader section that defines how data can be read and inserted into the index. Two data readers are provided by default:
- com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryReader
- A simple SQL loader that reads the original physical data from the data source in parallel as specified by the configuration files.
- com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryMultiplexReader
- Requires the index entity to have the KeyFieldName property that is defined and only one primary key field. The database column that maps to this primary key index field is used as the identifier for the index entry.It is used in the following way:
- The KeyFieldName property is the index field name for the primary key.
- The query tag is the database SQL query to be used, and must be ordered by the primary key field.
- Multiple ColumnMapping tags can be used, with each one mapping to a database table column (name) with an index field name (value).
- The DynamicFields section allows a list of dynamic fields to be defined. Multiplexing is applied to this field with the column name as the resolved value from dynamicFieldName and the value in this column as the resolved value from dynamicFieldValue. In addition, dynamicFieldName and dynamicFieldValue can be used as a template where other field variable names can be declared. An optional parameter, indexingMode, with its default value as replace, is used to define the behavior for handling multiple values in this dynamic column. Other supported operations are append and sum, where append is for handling multi-value index fields, and sum is for adding up all the values.