Setting up a thesaurus dictionary for the Webserver search engine on HTTP Server

 

In the IBM HTTP Server for i5/OS, you can set up a thesaurus dictionary file for use with the Webserver search engine with the IBM Web Administration for i5/OS interface.

Information for this topic supports the latest PTF levels for HTTP Server for iSeries . IBM recommends that you install the latest PTFs to upgrade to the latest level of the HTTP Server for i5/OS. Some of the topics documented here are not available prior to this update. See IBM Service for more information.

The thesaurus support allows you to automatically expand a search query by using a thesaurus. To make sure information is always found in your indexed documents, you can create your own thesaurus in which you can list common terms and associate them with the terms that exist in your documents. For example, if a person typically searches for PC but your documents only refer to a personal computer, just add PC to your thesaurus as a synonym of personal computer. You must first create a thesaurus definition file that contains terms that are related. Then you build the thesaurus dictionary to be used by the Webserver search engine. See Build the thesaurus dictionary for more information.

The thesaurus definition file can be created in IFS or QSYS.LIB.

 

Parent topic:

Search tasks

 

Create a thesaurus definition file

To create a thesaurus definition file, do the following:

  1. Open a text editor on the iSeries™, such as edtf.

    Use edtf or some other iSeries editor rather than a PC editor. It is important that the file is tagged with the correct CCSID since the words will be matched with words in the documents themselves.

  2. Create the content of the thesaurus definition file using the following file format:

    1. Open a text editor on the iSeries, such as edtf.

      Use edtf or some other iSeries editor rather than a PC editor. It is important that the file is tagged with the correct CCSID since the words will be matched with words in the documents themselves.

    2. Create the content of the thesaurus definition file using the following file format:

      A thesaurus definition file consists of blocks containing elements. Each element of the block is defined by a capitalized keyword. The block also contains terms that are single or multiple words. For example "cake" and "chocolate cake" are terms.

      Each block starts with :WORDS followed on the same line by one of the following:

      RELATED 
         Where :RELATED indicates related terms that are not synonyms. 
      :SYNONYM 
         Where :SYNONYM indicates terms that are synonyms. 

      Member terms are listed in the block starting on the second line of the block, one term per line. For example:

      :WORDS:SYNONYM PC personal computer

      The following relationships can also be specified within the block:

      .LOWER_THAN

      Where the block member terms are more specific in meaning than the term following .LOWER_THAN.

      .HIGHER_THAN

      Where the block member terms are less specific in meaning than the term following .HIGHER_THAN.

      .RELATED_TO

      Block member terms are related to this term.

      .SYNONYM_OF

      Block member terms are synonyms of this term.

      A related term is specified on the same line as the relationship. A term is a single or multiple words. The relationships can be specified in any order within the block. For example the two following blocks are interpreted exactly the same:

      :WORDS       rain       snow       hail .LOWER_THAN precipitation .RELATED_TO weather 
      :WORDS .LOWER_THAN precipitation .RELATED_TO weather       rain       snow       hail

      When creating a thesaurus definition file, keep the following in mind:

      • Preceding and trailing blanks are removed.

      • Preceding and trailing control characters are removed.

      • Terms beginning with a period (.) or a colon (:) are not allowed.

      • Capital letters and small letters of the same character are treated as the same character.

      • Leave the keywords that are UPPERCASED as-is.

      • Terms in the file may be in any language.

      • The maximum length of a term is 64 characters or 64 bytes.

      A sample thesaurus definition file is stored in /QIBM/ProdData/HTTP/Public/HTTPSVR/sample_thesaurus.txt.

    3. Once you have created the thesaurus definition file, save it as a text file (txt).

      Terms can be added in any supported language; however, the keywords (:RELATED, :SYNONYM, .LOWER_THAN, .HIGHER_THAN, .RELATED_TO, .SYNONYM_OF, and :WORDS) can not be changed in order for the definition file to work.

After you have created a thesaurus dictionary, you can manage a thesaurus dictionary . See Managing a thesaurus dictionary for the Webserver search engine on HTTP Server for more information. To use the dictionary on a search, select your index and the search option. After you have selected to do a simple or advanced search, you will reach a form that allows you to add a thesaurus dictionary to your search.

 

Build the thesaurus dictionary

To build the thesaurus dictionary, allowing it to be used by the Webserver search engine, do the following:

  1. Click the Advanced tab.

  2. Click the Search Setup subtab.

  3. Expand Search Engine Setup.

  4. Click Build thesaurus dictionary.

  5. Enter the directory and name of the thesaurus definition file that contains relationship data for generating a thesaurus dictionary in the Thesaurus definition file field. A definition file is a simple text file with formatting tags to indicate word relationships.

  6. Enter a name for the thesaurus dictionary in the Thesaurus dictionary name field. For example, mydict.

  7. Enter the directory that is used to hold the thesaurus dictionary files that are created in the Thesaurus dictionary directory field. Possible values include /QIBM/UserData/HTTPSVR/search (the default setting), or any valid directory path.

  8. Click Apply.