< Previous | Next >

Customize the stopwords.txt file

In this lesson, you edit configuration files to influence the behavior of the Solr 7.3.1 search engine. The particular example is customization of the stopwords.txt file.

The stopwords.txt file is a configuration file that lists the words used by the Solr stop filter. In WebSphere Commerce Version 9, we can change the behavior of the stop filter by pointing the engine at our own stopwords.txt file.

In the following tutorial, you will customize the English stopwords.txt file, and verify that you haves successfully changed the behavior of the Solr search engine.


Before beginning

  1. Ensure that we are working on the correct version of the stopwords.txt file. The default file is solrhome/MC_masterCatalogID/locale/CatalogEntry/conf/stopwords.txt, but it may have been extended, as described in Limiting search terms and characters from the search query. Locate the extended file, or create a new one to work on.If we are uncertain whether your system is referring to the default stopwords.txt file or its extended counterpart, we can investigate to find out.

    1. Determine what the content of the name field is in either the default solrhome/v3-index/CatalogGroup/conf/schema.xml or extended solrhome/v3-index-ext/CatalogGroup/x-schema.xml file. Look for a definition similar to the following:

        <field name="name" type="wc_text_${lang:en}" indexed="true" stored="true" multiValued="false"/>

    2. In this example, the en language code has been assigned to name. This language code will be used as part of the reference to the stopwords.txt file, making its name stopwords_en. We can the path that this name is associated with by looking in the solrhome/v3/common/schema-field-types.xml file. Look for the target of the solr.StopFilterFactory filter. It will resemble the following:

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="${stopwords_en:../../common/stopwords.txt}"/>

      In this case, the stopwords_en name has been associated with ../../common/stopwords.txt. If stopwords_en is not otherwise specified in SCHCONFIG, this will be the default file.

  2. Add the parameter stopwords= stopwords_file_path to the CONFIG column of the SRCHCONFEXT database table, where stopwords_file_path is the path to our customized stopwords.txt file. In the container environment, you would use an SQL command similar to the following:

      update SRCHCONFEXT set CONFIG='stopwords=/opt/WebSphere/Liberty/usr/servers/default/resources/search/index/managed-solr/config/v3-index-ext/common/stopwords.txt, original_config' 
      where indextype='CatalogEntry' and indexscope=masterCatalogId and indexsubtype='Structured';

    Where the highlighted original_config is the original CONFIG value for the record, and masterCalatogId should be changed into our own master catalogId.

  3. We can add stop words for specific languages. To make a stopwords.txt file language-specific, add the line stopwords_lang= stopwords_lang_file_path to the CONFIG column of the SRCHCONFEXT table, where lang is the language code. For example, to add our own French stop words, add the line stopwords_fr=stopwords_fr_file_path to the SRCHCONFEXT table CONFIG column, where stopwords_fr_file_path is the path to the French stop words file.


Procedure

  1. In the storefront, search for the string "can." You should see a result similar to the following:

  2. Copy the solrhome/MC_masterCatalogID/locale/CatalogEntry/conf/stopwords.txt file to the directory workspace_dir\search-config-ext\src\index\managed-solr\config\v3\common. Open the file in an editor.

  3. a. The file contains words such as "will" and "was" that help filter out unhelpful clauses in search queries. As an example that will be easy to test, add the word “can�? at the bottom of the file. If we are have copied the default file, the result should look something like the following:

      <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements.  See the NOTICE file distributed with # this work for additional information regarding copyright ownership.
      # The ASF licenses this file to You under the Apache License, Version 2.0
      # (the "License"); you may not use this file except in compliance with # the License.  You may obtain a copy of the License at
      #
      #     http://www.apache.org/licenses/LICENSE-2.0
      #
      # Unless required by applicable law or agreed to in writing, software
      # distributed under the License is distributed on an "AS IS" BASIS,
      # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      # See the License for the specific language governing permissions and
      # limitations under the License.
      
      # a couple of test stopwords to test that the words are really being
      # configured from this file:
      stopworda
      stopwordb
      
      # Standard english stop words taken from Lucene's StopAnalyzer
      a
      an
      and
      are as
      at
      be
      but
      by
      for
      if in
      into
      is
      it
      no
      not
      of
      on
      or
      such
      that the their
      then
      there
      these
      they
      this
      to was
      will
      with can >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

  4. Add the value stopwords=stopwords_file_path to the CONFIG column of the SRCHCONFEXT database table, where stopwords_file_path is the relative path to the file discoverable in the container. The following command will insert the data.

      sql: update srchconfext set config=workspace_dir\search-config-ext\src\index\managed-solr\config\v3\common\stopwords_example.txt 
      where srchconfext_id=1;

  5. Restart the WebSphere Commerce Search server.


Results

Search again for the string "can" in the storefront. The search should return no results.

< Previous | Next >