Use the Solr atomic update feature with Search
Atomic update, also known as partial update, enables you to make index updates on specified stored fields in an existing document. This approach is especially useful when a core has many fields and only a small number of them have been changed between index builds. Solr supports several atomic update modifiers:
- set
- Set or replace a particular value, or remove the value if null is specified as the new value.
- add
- Adds an additional value to a list.
- remove
- Removes a value (or a list of values) from a list.
- removeregex
- Removes from a list that match the given Java regular expression.
- inc
- Increments a numeric value by a specific amount (use a negative value to decrement),
All original source fields must be stored for field modifiers to work correctly. This is the default in Solr. IndexLoad only supports the set modifier, so by default IndexLoad will fetch the data from a CSV file or database, and use that value to replace the specified stored Solr field value.
Example
Assume there are three inventory records for same product but from different stores, as follows:catentry_id "10044" inv_strlocqty_1 100 inv_strlocqty_2 200 inv_strlocqty_3 300 indexedTime "2018-11-28T14:51:58.042Z"An inventory update occurs, updating an available inventory in Store 1 to 400, and the available inventory in Store 2 to 500.
catentry_id store_id availquantity 10044 1 400 10044 2 500 To capture this change, run IndexLoad, loading data using the following source CSV file. We need only provide the final updated quantity for the changed stores in this CSV file.
catentry_id,inv_strlocqty_1,inv_strlocqty_2 10044,400, 500After index load, the document in Solr will look like:
catentry_id "10044" inv_strlocqty_1 400 inv_strlocqty_2 500 inv_strlocqty_3 300 indexedTime "2018-11-28T15:51:38.033Z"
Procedure to use atomic update with a CSV file
- Create the environment configuration file workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName-csv.xml, where profileName is the URL parameter you use when you call IndexLoad in a web browser. In following scenarios, price-delta is used as the profileName for the CSV scenario, and inventory-delta as profileName for the SQL scenario. The wc-indexload-profileName-csv.xml file contains environment control information and global properties required by Index Load. For example, it includes the specified data mapping between the CSV field and the corresponding Solr field. (You have the option of leaving a column empty of data if its name in this file matches a Solr field name.) This file also specifies the DataReader and mediator. To load from a CSV file, specify com.ibm.commerce.search.indexload.reader.SearchIndexLoadCSVReader as the reader, and com.ibm.commerce.search.indexload.mediator.SearchIndexLoadCSVMediator as BusinessObjectMediator. The wc-indexload-profileName-csv.xml file does not typically require customization. We can use the following sample file as-is.
<_config:DataLoader className="com.ibm.commerce.search.indexload.loader.SearchIndexLoadCSVLoader" > <_config:property name="FirstLineIsHeader" value="true" /> <_config:property name="Charset" value="UTF-8" /> <_config:property name="TokenDelimiter" value="," /> <_config:DataReader className="com.ibm.commerce.search.indexload.reader.SearchIndexLoadCSVReader" /> <_config:BusinessObjectBuilder> <_config:DataMapping> </_config:DataMapping> <_config:BusinessObjectMediator className="com.ibm.commerce.foundation.internal.server.services.indexload.mediator.SolrIndexLoadBusinessObjectMediator"/> <_config:BusinessObjectMediator className="com.ibm.commerce.search.indexload.mediator.SearchIndexLoadCSVMediator" /> </_config:BusinessObjectBuilder> </_config:DataLoader>
- Create the profile configuration file wc-indexload-profileName.xml.
The wc-indexload-profileName.xml file contains configurable performance attributes, and one or multiple load item definitions. It also contains the CSV file location and the target core name. Profile names that we define in configuration files are then substituted in as a URL parameter when you call IndexLoad in a web browser. The load item configurations are listed under the load order section of this file. Every LoadItem definition specifies a particular load item configuration such as coreName or location. Multiple load items are run in parallel. Within every load item configuration section, the environment configuration file wc-indexload-profileName-csv.xml must be specified. The profile configuration file also contains DataWriter configuration; keep the original com.ibm.commerce.search.indexload.writer.SearchIndexLoadBatchService as the writer. The CSV file need only contain the changed field value. IndexLoad will use the Solr atomic update API to update the specified stored field. Example: wc-indexload-price-delta.xml
<_config:LoadItem name="ExternalPrice-1" fileName="wc-indexload-externalprice-csv.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Price1_generic" /> <_config:property name="groupName" value="1" /> <_config:DataSourceLocation location="resources/search/index/indexload/contract-price-example1.csv" /> </_config:LoadItem>
- Run IndexLoad in POST mode with the profileName defined in step 2. For example, if the profileName configuration file named as wc-indexload-price-delta.xml, then run indexload with the URL:
https://searchMaster:3738/search/admin/resources/indexload/profile/price-delta/start?catalogId=#MASTER_CATALOG_ID
- After IndexLoad has run successfully, run WCB to build the package and deploy the package into the Search Docker container. See Packaging customized code for deployment.
Procedure to use atomic update via SQL
- Create the environment configuration file workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName-sql.xml.
This SQL version of the environment configuration file specifies the parallel indexing configuration. This configuration will be used to evenly split the dataset across multiple threads when run with the SolrIndexLoadQueryLoader and the configuration SQL code, which is used to capture the data from the specified datasource. This configuration file also specifies the data reader. There are two DataReader entries:
- com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryReader
- We can use this command to read unique records from database, and later save them into the index.
- com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader
- This command is used to transform multiple data entries from the database table into a single index row with numerous dynamic index fields.
Folllowing is a sample DataReader entry, which is used to get the updated inventory from a specific time. Since there are multiple records for any unique catentryId, the example uses com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader to accumulate multiple rows.
<_config:DataReader className="com.ibm.commerce.search.indexload.reader.SearchIndexLoadQueryMultiplexReader"> <_config:DynamicFields> <_config:DynamicField dynamicFieldName="inv_strlocqty_%storeId%" dynamicFieldValue="%quantity%" indexingMode="replace" /> </_config:DynamicFields> <_config:property name="KeyFieldName" value="catentry_id" /> <_config:property name="ExcludeFieldNames" value="storeId,quantity" /> <_config:property name="minDelta" value="5"/> <_config:Query> <_config:SQL> SELECT invavl.catentry_id, invavl.STORE_ID,INVAVL.AVAILQUANTITY FROM INVAVL, CATGPENREL WHERE CATGPENREL.CATALOG_ID = 10001 AND INVAVL.CATENTRY_ID = CATGPENREL.CATENTRY_ID AND INVAVL.QUANTITYMEASURE = 'C62' AND INVAVL.LASTUPDATE BETWEEN '2018-11-25 16:45:24.000' AND current timestamp ORDER BY INVAVL.CATENTRY_ID WITH UR </_config:SQL> <_config:ColumnMapping columnName="CATENTRY_ID" indexFieldName="catentry_id" /> <_config:ColumnMapping columnName="STORE_ID" indexFieldName="storeId" /> <_config:ColumnMapping columnName="AVAILQUANTITY" indexFieldName="quantity" /> </_config:Query> </_config:DataReader>
- Create the profile configuration file workspace_dir\workspace\search-config-ext\src\index\indexloadwc-indexload-profileName.xml.As with the CSV file approach, specify the SQL configuration file within the load item section:
<_config:LoadItem name="Inventory-Delta" fileName="wc-indexload-dom-delta-inventory-sql.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Inventory_generic" /> <_config:property name="groupName" value="I" /> </_config:LoadItem>
- Run IndexLoad with the defined profileName. For example, if in step 2, the profile configuration name is wc-indexload-inventory-delta.xml, then run:
https://searchMaster:3738/search/admin/resources/indexload/profile/inventory-delta/start?catalogId=#MASTER_CATALOG_ID
- After IndexLoad has run successfully, run WCB to build the package and deploy the package into the Search Docker container. See Packaging customized code for deployment.