Customize Index Load
We can customize Index Load to suit your business needs. For example, we can customize Index Load to read from multiple sources, or change how the source input is transformed.
Procedure
- Customizing Index Load by using SQL statements:
The wc-indexload-profileName.xml file contains the business object and load item definitions. Use the following sample files as a reference.
- wc-indexload-price.xml
- wc-indexload-price-sql.xml
The following sample snippet shows how to define the business object:
<_config:LoadItem name="ExternalPrice-1" businessObjectConfigFile="wc-indexload-price-sql.xml"> <_config:property name="coreName" value="MC_10001_CatalogEntry_Price_generic" /> </_config:LoadItem>The following sample snippet shows how to define the load items by using SQL statements:
<_config:DataLoader className="com.ibm.commerce.foundation.server.services.indexload.loader.solr.SolrIndexLoadQueryLoader"> <_config:property name="ParallelThreads" value="2" /> <_config:property name="ParallelLowerRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" /> <_config:property name="ParallelUpperRangeSQL" value="SELECT MAX(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" /> <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID > ?" /> <_config:property name="ParallelLowerRange" value="" /> <_config:property name="ParallelUpperRange" value="" /> <_config:property name="ParallelPrefetchSize" value="100" /> <_config:DataReader className="com.ibm.commerce.foundation.server.services.indexload.reader.solr.SolrIndexLoadQueryMultiplexReader"> <_config:DynamicFields> </_config:DynamicFields> <_config:Query> <_config:SQL> SELECT TI.CATENTRY_ID, TI.PRICE FROM TI_CNTRPRICE_0 TI WHERE TI.CATENTRY_ID >= %ParallelLowerRange% AND TI.CATENTRY_ID <= %ParallelUpperRange% ORDER BY TI.CATENTRY_ID /_config:SQL> <_config:ColumnMapping name="CATENTRY_ID" value="catentry_id" /> <_config:ColumnMapping name="PRICE" value="price" /> </_config:Query> </_config:DataReader> <_config:BusinessObjectBuilder className="com.ibm.commerce.foundation.internal.server.services.indexload.builder.SolrIndexLoadMapObjectBuilder" > <_config:BusinessObjectMediator className="com.ibm.commerce.foundation.internal.server.services.indexload.mediator.SolrIndexLoadBusinessObjectMediator"> <_config:extension className="com.ibm.commerce.foundation.server.services.indexload.mediator.solr.SolrIndexLoadExternalPriceMediator" /> </_config:BusinessObjectMediator> </_config:BusinessObjectBuilder> </_config:DataLoader>
Customizing Index Load by using ranges read the database in parallel: We can configure parallel load item configurations that can be used to split up data evenly across the data set. It uses multiple threads using the SolrIndexLoadQueryLoader to support automatic range-based parallel indexing. Use the following sample files as a reference:
- wc-indexload-price-adv.xml
- wc-indexload-external-price-adv1.xml
- wc-indexload-external-price-adv2.xml
The following sample snippet shows how to define ParallelLowerRangeSQL and ParallelUpperRangeSQL SQL ranges:
<_config:property name="ParallelThreads" value="2" /> <_config:property name="ParallelLowerRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" /> <_config:property name="ParallelUpperRangeSQL" value="SELECT MAX(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE" /> <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID > ?" /> <config:SQL> SELECT TI.CATENTRY_ID, TI.PRICE FROM TI_CNTRPRICE_0 TI WHERE TI.CATENTRY_ID >= %ParallelLowerRange% AND TI.CATENTRY_ID <= %ParallelUpperRange% ORDER BY TI.CATENTRY_ID </_config:SQL>The following sample snippet shows how to define ParallelLowerRange and ParallelUpperRange hardcoded ranges:
<_config:property name="ParallelThreads" value="2" /> <_config:property name="ParallelLowerRange" value="1000" /> <_config:property name="ParallelUpperRange" value="2000" /> <_config:property name="ParallelNextRangeSQL" value="SELECT MIN(CE.CATENTRY_ID) FROM TI_CNTRPRICE_0 CE WHERE CE.CATENTRY_ID > ?" /> <config:SQL> SELECT TI.CATENTRY_ID, TI.PRICE FROM TI_CNTRPRICE_0 TI WHERE TI.CATENTRY_ID >= %ParallelLowerRange% AND TI.CATENTRY_ID <= %ParallelUpperRange% ORDER BY TI.CATENTRY_ID </_config:SQL>
Customizing the Business Object Mediator: We can customize the Business Object Mediator to change how the source input is transformed. Create a custom business object mediator class that extends SolrIndexLoadBusinessObjectMediator:
protected void transform(Object dataObjects, boolean deleteFlag) throws DataLoadException { final String METHODNAME = "transform(Object, boolean)"; if (LoggingHelper.isTraceEnabled(LOGGER)) { LOGGER.entering(CLASSNAME, METHODNAME, new Object[] { dataObjects, deleteFlag }); } if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) { LOGGER.exiting(CLASSNAME, METHODNAME); } }
The subclass must implement the abstract method transform(). This method transforms the input logic business object to a physical document object to be saved into the Solr index. After the transform() method finishes, the super class passes the list of physical objects to the persistence layer to persist them in the Solr index. The subclass is responsible for populating all data in the physical objects. For example, to transform the following input into multiple column in price index:
catentry_id price 10001 price_USD_10001:100.00||price_EUR_10001:78.29||price_JPY_10001:11274|| price_KRW_10001:95048||price_BRL_10001:232.15||price_CNY_10001:802.25The following snippet is the default implementation to perform the transform:
public void transform(Map<String, Object> document) throws SolrIndexLoadException { final String METHODNAME = "transform(Map<String, Object>)"; if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) { LOGGER.entering(CLASSNAME, METHODNAME, new Object[] { document }); } if (document != null && !document.isEmpty()) { Object fieldValue = document.get("price"); StringTokenizer st = new StringTokenizer(fieldValue.toString(), "||"); while(st.hasMoreTokens()){ String priceElement = (String)st.nextElement(); int i = priceElement.lastIndexOf(":"); if (i < 0) { LOGGER.logp(Level.WARNING, CLASSNAME, METHODNAME, "ignoring invalid data format: " + priceElement + "(" + String.valueOf(document.get(0)) + ")"); continue; } String priceFieldName = priceElement.substring(0, i); //String currency = value.substring(6,i); String price = priceElement.substring(i + 1); Float newprice = Float.valueOf(price); document.put(priceFieldName, newprice); } document.remove("price"); } else { if (LoggingHelper.isTraceEnabled(LOGGER)) { LOGGER.logp(Level.WARNING, CLASSNAME, METHODNAME, "nothing to process"); } } if (LoggingHelper.isEntryExitTraceEnabled(LOGGER)) { LOGGER.exiting(CLASSNAME, METHODNAME); } }The following values are formed as a result:
<float name=" price_USD_10001">100.00</float> <float name=" price_EUR_10001">78.29</float> <float name=" price_JPY_10001">11274</float> <float name=" price_KRW_10001">95048</float> <float name=" price_BRL_10001">232.15</float> <float name=" price_CNY_10001">802.25</float>
Customizing the SolrIndexLoadQueryReader to transform multiple data entries from a database table into a single index row: The SolrIndexLoadQueryMultiplexReader can be used to transform multiple data entries from a database table into a single index row that contains multiple dynamic index fields.
Define the KeyFieldName property using one primary key field. The database column that maps to this primary key index field is used as the identifier for the index entry.