WebSphere Commerce Search index schema definition
The WebSphere Commerce Search index schema definition contains information for processing data in WebSphere Commerce.For more detailed information, check the solrhome/MC_masterCatalogId/locale/CatalogEntry/conf/schema.xml file for the search index schema definition that matches your version of WebSphere Commerce search.
Catalog Entry extended sites override
When an extended site overrides any of the following catalog entry properties: Name, short description, and keywords, both the asset store, and the extended site properties are being indexed. If a hit matches either the asset store, or the extended site store, the product is returned and updated by the run time with the extended site override display text.
Category Name
The category name is being indexed under the CatalogEntry index. This is a searchable field, which is used to enhance the relevancy of the product that is being returned. Runtime components boost this field. See Search relevancy and merchandising.
Master catalogs and multiple languages
A WebSphere Commerce instance might have multiple master catalogs, with each master catalog that belong to one store, or in extended sites, might belong to a cluster of stores. The master catalog is not reflected in the index schema. That is, it is reflected at the level of index cores. In addition, each master catalog contains its supported languages through its attached stores. A separate index core is created for each language and master catalog. The following are the main benefits of maintaining separate index cores for each language:
- Clean schema design
- The text search schema is language-dependent. For example, different languages contain different analyzers, where some languages contain spell checking or stemming, while others do not.
- Runtime query performance
- Shoppers typically only search using one language. The query then runs against only one index.
- Easier maintenance for language dictionaries
- Each language contains different synonyms, stop words, and rules.
Where the WebSphere Commerce instance contains two master catalogs, with each master catalog possibly containing different languages. Separate catalog entry index cores are created in each language, under each master catalog. When new WebSphere Commerce objects are indexed, new cores are created using the same structure. Each index then provides its own set of search functionality in the storefront, such as its own automatic keyword suggestions, spelling corrections, and dictionaries.
WebSphere Commerce store models and catalog entry types
Search indexing supports various store models by indexing the top-level Catentry store_id. This information is stored in the STORECENT table. In extended site, the store path information is built in query conditions, including both the shopper's extended site store_id, and parent asset store_id.
WebSphere Commerce catentry types such as Product, Item, Bundle, and Package are supported by default for WebSphere Commerce Search indexing. They are in the same index, with their data extracted separately from the WebSphere Commerce database. For example, the query to extract attribute values for products differs for items. The CATENTTYPE_ID is indexed as a field in the search index to distinguish each catentry's type.
Prices in WebSphere Commerce Search
WebSphere Commerce Search indexes the default offer price. The default offer price is typically considered as the offer price under the catentry's owning store's master catalog's default trading position container. The default offer price can also be complex by considering the following variables:
- The store's default currency
- Each store contains only one default currency. This default currency price is indexed as a searchable field. The default index contains USD as the default currency, following the WebSphere Commerce sample stores. If the store's default currency is different, we can index it by adding new searchable fields in the index schema and adding new SQL in the Data Import Handler (DIH).
- Multiple stores, each with a different default currency
- This requires separate index fields for each currency, For example, price_USD, price_CAD, price_EUR, price_CNY. The following currency codes contain a predefined default index field:
- USD
- CAD
- EUR
- CNY
- TWD
- JPY
- GBP
- KRW
- BRL
- PLN
- RON
- RUB
- EGP
- ILS
- TRY
Where the default indexing indexes the price for different currencies, if there is a value for the currency in the OFFERPRICE table.
- Stores with multiple currencies (currency conversion)
- There are two different approaches in WebSphere Commerce to support multiple currencies and currency conversion. The first approach is populating the OFFERPRICE table with the price in each currency. The second approach is calculating the currency exchange based on the default currency price during run time, with the exchange rate in the CURCONVERT table. Since the WebSphere Commerce sample stores use the second approach and contain only the price for USD, the default indexing only contains the price_USD field as populated.
- Miscellaneous attributes of offer
- The effective date (OFFER.STARTDATE and OFFER.ENDDATE) is not considered by the default index. Instead, the offer precedence is considered in indexing, with only the offer price with the highest OFFER.PRECEDENCE indexed. The offer quantity is also considered in indexing, where only the offer with (OFFER.MINIMUMQUANTITY IN (1, 0) OR OFFER.MINIMUMQUANTITY IS NULL) is indexed.
- Price override in extended site stores
- Not indexed by default.
- Price rule
- Not indexed by default.
- Catalog Entry List price
- Indexed by default.
Contract entitlement
The runtime contract entitlement filtering is performed through the product set. Based on the shopper's eligible contract in the session, where the contracts are represented as a list of included and excluded product sets, the PRSETCEREL table is checked against each catentry to ensure that the shopper is eligible to access the catentry. Therefore, the PRSETCEREL.PRODUCTSET_ID is indexed and queried with the shopper's list of included and excluded product sets. When the includeEntireCatalog flag is true the entitlement logic ignores productset checking. That is, the entire catalog is selected and productset_id is not evaluated against the search index.
Spell check
A dedicated indexed multivalue field is being used to maintain the source and scope of the spell check suggestions. The content of the spellCheck field is populated by copying other column's content. The following are all the fields that are copied into the spellCheck field:
- name
- shortDescription
- keyword
- nameOverride
- shortDescriptionOverride
- keywordOverride
The spell check index field is the source of the keyword suggestions under the Keyword auto-suggest section, and also the source of the suggestions of misspelled keywords in the storefront.
Parent catalog groups and catentries
There are two parent catalog group fields in the index:
- parentCatgroup_id_search
- parentCatgroup_id_facet
The parentCatgroup_id_search field is used for search filtering. For example, searching for a catentry under a specific parent catalog group that can be in master catalog or sales catalog, where the parent catalog group is not necessarily the direct parent of the catentries. To improve the runtime search performance, parentCatgroup_id_search includes both the catentry's direct parent group and all its parents in the catalog hierarchical graph. In addition, since a catentry can contain different parent groups under different catalogs, the parent catgroup must be prefixed by the catalog_id: CatalogId_ParentCatgroupId. This multivalued field improves the search performance, but reduces the catgroup faceting performance. This is because the search engine calculates the faceting count for every parent catgroup, even if most of the calculation results are not useful for faceting.
Therefore, the parentCatgroup_id_facet field is used for search faceting, and only indexes the catentry's direct parent catalog groups in one specific catalog. That is, catgroup faceting needs only the faceting numbers on leaf catalog groups to build the navigation tree. The format of the field is parentCatgroupId.
For parent catentries in WebSphere Commerce, an item can belong to one product, many packages, many bundles, and many kits. These parent catentry's IDs are indexes in a field in the index, so that the parent catentry can be easily identified.
Attribute Dictionary
Attributes in the Attribute Dictionary are indexed as catentry properties. Not only are the attributes indexed, but business users can dynamically add attributes from the index.
The mapping is stored in the ATTRDICTSRCHCONF table. ADS_Fn columns are for string type of attributes, ADI_Fn columns are for integer type of attributes, and ADF_Fn columns are for float type of attributes.
Attribute Dictionary attributes
Some of the catalog entry's associated Attribute Dictionary Attributes properties are being index under the multivalue ad_attribute catalogEntry index field. The properties are tokenized into a string, which is separated by a delimiter. The following snippet is an example of the raw Attribute Dictionary attributes data as it appears in the catalog entry index:
<arr name="ad_attribute"> <str>7000000000000000003/_/construction/_/Construction/_/1.00000/_/7000000000000000025/_/Knit/_/Knit/_/19.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> <str>7000000000000000004/_/length/_/Length/_/1.00000/_/7000000000000000027/_/Short/_/Short/_/20.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> <str>7000000000000000005/_/material/_/Material/_/1.00000/_/7000000000000000030/_/Synthetic/_/Synthetic/_/21.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> <str>7000000000000000002/_/swatchcolor/_/Color/_/1.00000/_/0/_/0/_/0/_/.00000/_/1/_/1/_/1/_/0/_/1/_/1/_/0</str> <str>7000000000000000006/_/occasion/_/Occasion/_/1.00000/_/7000000000000000037/_/Cocktail & Evening/_/Cocktail & Evening/_/25.00000/_/1/_/0/_/1/_/0/_/0/_/2/_/0</str> <str>7000000000000000001/_/swatchSize/_/Available Sizes/_/2.00000/_/0/_/0/_/0/_/.00000/_/1/_/1/_/1/_/0/_/1/_/1/_/0</str> </arr>The following properties are being indexed and mapped to the following internal index field name:
Attributes properties
Index Database source table/column name Internal field name as defined in the wc-component.xml Description 0 CATENTRYATTR.ATTR_ID attr_id The attribute unique ID. 1 ATTR.IDENTIFIER identifier The attribute identifier. 2 ATTRDESC.NAME name The attribute display name. 3 CATENTRYATTR.SEQUENCE sequence The attribute sequence with its product. 4 CATENTRYATTR.ATTRVAL_ID attrval_id The attribute value unique ID. 5 ATTRVAL.IDENTIFIER identifier The attribute value identifier. 6 ATTRVALDESC.STRINGVALUE,ATTRVALDESC.VALUE value The actual attribute value display value (if string type, uses the STRINGVALUE field; otherwise, uses the VALUE field. 7 ATTRVALDESC.SEQUENCE sequence The attribute value sequence with its attribute. 8 ATTR.DISPLAYABLE displayable The attribute is marked searchable. 9 ATTR.SEARCHABLE searchable The attribute is marked displayable. 10 ATTR.COMPARABLE comparable The attribute is marked comparable. 11 ATTR.STOREDISPLAY storeDisplay The attribute is being used for ribbon ad. 12 ATTR.FACETABLE facetable The attribute is marked facetable. 13 CATENTRYATTR.USAGE usage The attribute is assigned to a product as either descriptive or defining attribute. 14 ATTRVALDESC.IMAGE1 image1 The attribute value image 1.
Bundles and Packages components
Some of the catalog entry's components properties are indexed under the CatalogEntry component's multivalue index field. The properties are tokenized into a string, which is separated by a delimiter, and sequenced according to their sequence value in the CATENTREL.SEQUENCE, DKPDCCOMPLIST.SEQUENCE database columns.
The following snippet is an example of the raw components data as it appears in the catalog entry index: Bundle, and Package component index field:
<arr name="components"> <str>11412/_/1.00000</str> <str>11420/_/1.00000</str> </arr>The following Bundle and Package properties are being indexed and mapped to the following internal index field name:
Bundle and Package properties
Index Database source table/column name Internal field name as defined in the wc-component.xml Description 0 CATENTREL.CATENTRY_ID_CHILD, and DKPDCCOMPLIST.CATENTRY_ID catentry_id The component unique ID. 1 CATENTREL.QUANTITY, DKPDCCOMPLIST.QUANTITY quantity The component quantity.
Related concepts
Workspaces in WebSphere Commerce Search