Customizable components of the final Solr query
We can customize components of the final Solr query by using query parsers. A query parser is a component responsible for parsing the textual query and converting it into corresponding Lucene Query objects.
Disjunction Max Query (Dismax), Dis-joint text (multiple fields), and Maximum match (score), searches for every pair of the field or term separately. Then, it combines the results to calculate the maximum score value of the results.
ExtendedDisMax adds the following features on top of Dismax:
Minimum match
Minimum match provides flexibility when the ANY search type is used, where we can specify how many of the search keywords must match the indexed documents. A number denotes the number of query keywords to match. A number that is formulated with a percentage denotes that a percentage of the query keywords must match. For example:
- 1 denotes that at least one query keyword must match.
- 2<80% 6<50% denotes that when there are fewer than 3 keywords, both of the keywords must be found in the document. When 3 - 6 keywords occur, 80% of the keywords must be found in the document. When there are more than 6 keywords, 50% of the keywords must be found in the document.
For example, if a shopper searches for 3 keywords, 80% of the 3 keywords equals 2.4. Rounded down, results that match at least 2 of the 3 entered keywords are returned.
Important: We must use the correct character encoding when you enter percentage values in a file. For example:
- In a JSP fragment file, such as SearchSetup.jspf, the preceding percentage value is entered as is: 2<80% 6<50%.
- In the wc-component.xml file, such as in this case, wc-component.xml, the preceding percentage value is entered as: 2<80% 6<50%.
See Tuning multiple-word search result relevancy by using minimum match and phrase slop. Be aware of the following behavior when the ANY search type is used:
Search type Search results ANY Searches for red dress return the following results: red dress, red potato, red fish, dress shoes, dress shirt, dress belt. ALL or EXACT Products with indexed searchable fields that contain red and dress are returned, but not the blue summer dress. The red floral dress is returned for the ALL search type, but not for the EXACT search type, as it is not an exact match.
Note: If you define your synonyms using the search term associations in the Management Center, the user's query is expanded with its synonyms before it is passed into Solr. Solr then applies the minimum match on this expanded list, instead of the original list. This behavior produces undesirable results.Consider the following workarounds or business scenarios to customize the site to use minimum match and synonyms:
- If you do not require minimum match, but do require synonyms: Set up your synonyms by using search term associations in the Management Center. This flow allows each store or extended site to use their own synonyms list.
- If you require both minimum match and synonyms, and you have only one store per master catalog, or all of the stores within the same master catalog share synonyms list, and do not require multiword synonym or replacement terms: Complete the following task: Combining minimum match with search term associations (using the Solr expansion algorithm).
Phrase fields
Phrase fields (pf) can boost the score of documents when all the terms in the q parameter appear in close proximity. For example:pf=name^10.0 defaultSearch^1.0 categoryname^100.0 shortDescription^5.0 partNumber_ntk^15.0
Phrase slop
Phrase slop (ps) specifies how far apart the indexed search terms are in the document to influence relevancy. For example, searches for sports movie with ps = 1 results in sports movie being more relevant than sports is the type of a movie. Phrase slop defines the amount of slop on phrase queries that are built for phrase fields (pf). See Tuning multiple-word search result relevancy by using minimum match and phrase slop. WebSphere Commerce checks for minimum match and phrase slop in the following order:
- Defined in the URL.
- Defined in the search profile.
- Defined in the catalog component configuration file (wc-component.xml) on the Search EAR.
Phrase bigram fields
Phrase bigram fields (Pf2) break down the input into bi-grams. For example, the brown fox jumped is queried as the brown brown fox and fox jumped. Therefore, if our search phrase is red hat black jacket, we can use ps=0 with pf2 to ensure that products that contain red hats are boosted before black hats.
Phrase trigram fields
Phrase trigram fields (Pf3) break down the input into tri-grams. For example, the brown fox jumped is queried as the brown fox brown fox jumped.
Tie breaker
Tie breakers occur when a term from the user's input is tested against multiple fields and more than one field matches. Each field generates a different score based on how common the word is in the field (for each document relative to all other documents). The score from the field with the maximum score is used by default. If two documents contain a matching score, the tie parameter decides which field breaks the tie. When a tie parameter is specified, the scores from other matching fields are added to the score of the maximum scoring field:(score of matching clause with the highest score) + ( (tie parameter) * (scores of any other matching clauses) )
The tie parameter configures how the final score of the query is influenced by the scores of the lower scoring fields, which are compared to the highest scoring field.
The tie value is set to 0.1 by default. A value of 0.0 makes the query a pure disjunction max query, where only the maximum scoring subquery contributes to the final score. A value of 1.0 makes the query a pure disjunction sum query, where the maximum scored subquery is irrelevant, and the final score is the sum of the sub scores. A low value, for example, 0.1, is typically useful. For more information about changing the tie value, see Adding native Solr query parameters to search expressions.
Filter query
Filter query (fq) specifies a query that can be used to restrict the documents that can be returned, without influencing the score. This can be useful for speeding up complex queries, since the queries specified with fq are cached independently from the main query.
Query boost
Boosts can be performed both at index-time or query-time: Index-time boosts are applied when adding documents, and apply to the entire document or to specific fields. Query-time boosts are applied when constructing a search query, and apply to specific fields. Query boosts are applied by appending the caret character ^ followed by a positive number to query clauses. For example:
title:sail OR (title:sail AND title:boat)^2.0 OR title: "sail boat"^10Lucene allows negative boosts; however, Solr does not. The only way to meaningfully perform a negative boost is to apply a positive boost to a negative query. For example:
(*:* -title:foo)^2.0
Where all documents that do not contain foo in the title are boosted by 2.0.
Solr provides another way of boosting documents by using function queries. FunctionQuery allows you to use the actual value of a field and functions of those fields in a relevancy score. See Solr Wiki: FunctionQuery.
Examples
- We can change the sequence of returned results by specifying a different sort sequence. For example: sort=socre desc, price asc.
- We can add a filter (fl) parameter to the Solr query. The fl parameter restricts fields to be returned with the result set. For example: fl=id, title, text restricts only searches in the id, title, and text fields.
- We can modify the list of fields included in the <_config:result> section of the search profile within the wc-search.xml file.For example, for the IBM_findProductsBySearchTerm search profile:
<_config:result> <_config:field name="catentry_id"/> <_config:field name="storeent_id"/> <_config:field name="buyable"/> <_config:field name="partNumber_ntk"/> <_config:field name="name"/> <_config:field name="shortDescription"/> <_config:field name="thumbnail"/> <_config:field name="keyword"/> <_config:field name="mfName_ntk"/> <_config:field name="catenttype_id_ntk_cs"/> <_config:field name="price_*"/> <_config:field name="listprice_*"/> <_config:field name="parentCatgroup_id_facet"/> <_config:field name="childCatentry_id"/> <_config:field name="mfPartNumber_ntk"/> <_config:field name="ad_attribute"/> <_config:field name="isDKPreConfigured"/> <_config:field name="dkModelReference"/> <_config:field name="dkURL"/> <_config:field name="dkDefaultConfiguration"/> <_config:field name="parentDKModelRef"/> <_config:field name="dkConfigurable"/> <_config:field name="parentDKConfigurable"/> <_config:field name="startdate"/> ; <_config:field name="enddate"/> </_config:result>We can change the existing values of these parameters in the wc-search.xml file. We can create a new preprocessor to add a component to be used in the final Solr query. See Creating a custom query preprocessor.
- Analyzing Solr queries
We can analyze the parts of a Solr query to help understand the actions Solr takes when submitting a query.- Adding query or filter query parameters to the final Solr query
We can add query or filter query parameters to the final Solr query to restrict the documents that can be returned.