Customizable components of the final Solr query

We can customize components of the final Solr query by using query parsers. A query parser is a component responsible for parsing the textual query and converting it into corresponding Lucene Query objects.

Disjunction Max Query (Dismax), Dis-joint text (multiple fields), and Maximum match (score), searches for every pair of the field or term separately. Then, it combines the results to calculate the maximum score value of the results.

ExtendedDisMax adds the following features on top of Dismax:


Minimum match

Minimum match provides flexibility when the ANY search type is used, where we can specify how many of the search keywords must match the indexed documents. A number denotes the number of query keywords to match. A number that is formulated with a percentage denotes that a percentage of the query keywords must match. For example:

See Tuning multiple-word search result relevancy by using minimum match and phrase slop. Be aware of the following behavior when the ANY search type is used:

Search type Search results
ANY Searches for red dress return the following results: red dress, red potato, red fish, dress shoes, dress shirt, dress belt.
ALL or EXACT Products with indexed searchable fields that contain red and dress are returned, but not the blue summer dress. The red floral dress is returned for the ALL search type, but not for the EXACT search type, as it is not an exact match.

Note: If you define your synonyms using the search term associations in the Management Center, the user's query is expanded with its synonyms before it is passed into Solr. Solr then applies the minimum match on this expanded list, instead of the original list. This behavior produces undesirable results.Consider the following workarounds or business scenarios to customize the site to use minimum match and synonyms:


Phrase fields

Phrase fields (pf) can boost the score of documents when all the terms in the q parameter appear in close proximity. For example:


Phrase slop

Phrase slop (ps) specifies how far apart the indexed search terms are in the document to influence relevancy. For example, searches for sports movie with ps = 1 results in sports movie being more relevant than sports is the type of a movie. Phrase slop defines the amount of slop on phrase queries that are built for phrase fields (pf). See Tuning multiple-word search result relevancy by using minimum match and phrase slop. WebSphere Commerce checks for minimum match and phrase slop in the following order:

  1. Defined in the URL.

  2. Defined in the search profile.

  3. Defined in the catalog component configuration file (wc-component.xml) on the Search EAR.


Phrase bigram fields

Phrase bigram fields (Pf2) break down the input into bi-grams. For example, the brown fox jumped is queried as the brown brown fox and fox jumped. Therefore, if our search phrase is red hat black jacket, we can use ps=0 with pf2 to ensure that products that contain red hats are boosted before black hats.


Phrase trigram fields

Phrase trigram fields (Pf3) break down the input into tri-grams. For example, the brown fox jumped is queried as the brown fox brown fox jumped.


Tie breaker

Tie breakers occur when a term from the user's input is tested against multiple fields and more than one field matches. Each field generates a different score based on how common the word is in the field (for each document relative to all other documents). The score from the field with the maximum score is used by default. If two documents contain a matching score, the tie parameter decides which field breaks the tie. When a tie parameter is specified, the scores from other matching fields are added to the score of the maximum scoring field:

The tie parameter configures how the final score of the query is influenced by the scores of the lower scoring fields, which are compared to the highest scoring field.

The tie value is set to 0.1 by default. A value of 0.0 makes the query a pure disjunction max query, where only the maximum scoring subquery contributes to the final score. A value of 1.0 makes the query a pure disjunction sum query, where the maximum scored subquery is irrelevant, and the final score is the sum of the sub scores. A low value, for example, 0.1, is typically useful. For more information about changing the tie value, see Adding native Solr query parameters to search expressions.


Filter query

Filter query (fq) specifies a query that can be used to restrict the documents that can be returned, without influencing the score. This can be useful for speeding up complex queries, since the queries specified with fq are cached independently from the main query.


Query boost

Boosts can be performed both at index-time or query-time: Index-time boosts are applied when adding documents, and apply to the entire document or to specific fields. Query-time boosts are applied when constructing a search query, and apply to specific fields. Query boosts are applied by appending the caret character ^ followed by a positive number to query clauses. For example:

Lucene allows negative boosts; however, Solr does not. The only way to meaningfully perform a negative boost is to apply a positive boost to a negative query. For example:

Where all documents that do not contain foo in the title are boosted by 2.0.

Solr provides another way of boosting documents by using function queries. FunctionQuery allows you to use the actual value of a field and functions of those fields in a relevancy score. See Solr Wiki: FunctionQuery.


Examples