Limiting search terms and characters from the search query
We can limit search terms and characters from the search query, such as unimportant words, stemming, or disabling wildcards and other characters.
Procedure
- Removing unimportant words from the search query: Stop words remove common parts of speech that are typically unimportant. Such as the, and, or for. They are defined in the stopwords.txt file. For example, if a shopper searches for the shirt in the storefront, the is skipped by Solr.
To activate the stop words feature:
- Copy the solrhome/MC_masterCatalogID/locale/CatalogEntry/conf/stopwords.txt file to a location that will be accessible within the Search server's container.
- Add the value stopwords=stopwords_file_path to the CONFIG column of the SRCHCONFEXT database table, where stopwords_file_path is the relative path to the file discoverable in the container. The command to insert the data is update srchconfextset config='stopwords_en=stopwords_file_path' where srchconfext_id=x; where stopwords_en=stopwords_file_path is the path to the stopwords.txt file, and x should be replaced by your desired ID. Normally, this is the record for the "Structured" index subtype with a certain language.
- Restart the WebSphere Commerce Search server.
To create a language-specific stop words list, add the language code to the stopwords parameter of the database entry. This version of the value uses the form stopwords_lang=stopwords_lang_file_path, where stopwords_lang_file_path is the path to the language-specific stop words file.
For example, To add our own French stop words, add stopwords_fr= stopwords_fr_file_path to the SRCHCONFEXT table's CONFIG column. Stop words are considered at both indexing and querying time.
If we are using the AND search type, no search results are returned, since the is defined in the stopwords.txt file. See StopFilterFactory.
- Preventing stemming: To protect certain words from being stemmed, we can add them into the protwords.txt file.
- Disable wildcard and other character searches: Wildcard searching is enabled by default, but if necessary, we can disable it for runtime performance or security reasons:
- Performance might be impacted, as a wildcard search that uses a common term might return many documents from the search index.
- Security might be a consideration, as Solr does not analyze and apply filters to wildcard searches.
A prohibited words list stops the search request from further searching, and is configurable in the wc-component.xml file.
For example, when you search for * by default, the resulting page is routed to the Prohibited Characters store page. The default configuration is:
<_config:property name="StopPatterns" value="\*,~,\?,'',"",.*\\.*,.*/.*,.*\|.*" />
We can update the configuration to disable wildcard (*) searches or other characters using the regular expression format.
Related concepts
Wildcard searching