Configure file attachment indexing settings 

Edit settings in the search-config.xml file to configure Search for file attachments.


Before you begin

To edit configuration files, use the IBM WAS wsadmin client. See Starting the wsadmin client for details.


About this task

IBM Connections supports the indexing of file attachment content from the Files and Wikis applications. Search also provides a dedicated document conversion service. When a files indexing task is run, the document conversion service downloads files, converts them to plain text, and then indexes the content. During this process, content from different MIME types is indexed. For a list of the MIME types supported by Search, see Supported MIME types.

The behavior of the document conversion service can be altered by modifying various settings, allowing administrators to control the file content indexing process.

Attention: When file indexing is enabled, the content of files is not indexed the first time that the index is run. The first index starts the process of retrieving the file content, but the actual indexing of the content only takes place when the index is run for the second time.


Procedure

To configure file attachment indexing settings...

  1. From the dmgr host:

      cd $DMGR_PROFILE/bin
      ./wsadmin.sh -jython
      execfile("searchAdmin.py")

      If prompted to specify a service to connect to, type 1 to pick the first node in the list. Most commands can run on any node. If the command writes or reads information to or from a file using a local file path, pick the node where the file is stored.

  2. Check out the Search cell-level configuration file using the following command:

      SearchCellConfig.checkOutConfig("<working_dir>", "<cellName>")

      where:

      • <working_dir> is the temporary directory to which you want to check out the cell level configuration file. This directory must exist on the server where you are running the wsadmin client. Use forward slashes to separate directories in the file path, even if you are using the Microsoft Windows operating system.

          Note: AIX and Linux only: The directory must grant write permissions or the command will not run successfully.

      • <cellName> is the name of the cell that the Search node belongs to. This argument is required. It is also case-sensitive, so type it with care. If you do not know the cell name, you can determine it by typing the following command in the wsadmin command processor:

          print AdminControl.getCell()

      For example:

      SearchCellConfig.checkOutConfig("c:/search_temp", "SearchServerNode01Cell")

  3. Use the following commands to control the file content indexing process.

      SearchCellConfig.enableAttachmentHandling()

        Enables the indexing of file attachments in the Files and Wikis applications.

        Note: If you already disabled the attachment handling of files during the last indexing, you need to rebuild the index again after enabling attachment handling. Otherwise, this command won't take effect.

        This command does not take any input parameters.

      SearchCellConfig.disableAttachmentHandling()

        Disables the indexing of file attachments in the Files and Wikis applications.

        This command does not take any input parameters.

      SearchCellConfig.setMaximumAttachmentSize(int maxAttachmentSize)

        Sets the upper limit on the size of files that can be downloaded for indexing. Files above this limit are not downloaded for content indexing.

        Files are downloaded to a temporary directory, which is located in the index directory. The temporary directory size available must be greater than the maximum file size allowed for content indexing.

        This command accepts one argument:

        • maxAttachmentSize. The maximum file size in bytes of any file attachment eligible for indexing. This is an integer value.

        For example:

        SearchCellConfig.setMaximumAttachmentSize("204800")

      SearchCellConfig.setCacheExpiryTime(int numberOfDays)

        Sets the number of days for which a downloaded file's indexable content is cached in the database. This information is cached for potential reuse at indexing time. If a file is not reused in the number of days specified, its entry in the database cache is deleted. If the file content has changed, the file is downloaded again and the cache is updated with the revised content.

        This command allows you to ensure that the database cache used for indexing files is kept up-to-date.

        The expiry time is measured in days. Specify a positive integer greater than zero.

        For example:

        SearchCellConfig.setCacheExpiryTime("30")

      SearchCellConfig.setCacheFileSize(int cacheFileSize)

        Specifies the maximum amount of indexable text per file.

        The size is specified in bytes. Use a positive integer greater than zero.

        The cache file size is set to 200 KB by default. This is also the maximum amount of content that can be indexed. If this value is set to a value greater than 200 KB, the Search application only caches 200 KB of data per file.

        This command accepts one argument:

        • cacheFileSize. The number of bytes of indexable and searchable file content stored in the database cache. This is an integer value.

        For example:

        SearchCellConfig.setCacheFileSize("8435456")

      SearchCellConfig.setMaxCacheEntries(int maxCacheEntries)

        Sets the maximum number of cached file entries allowed in the database cache.

        This command takes a single argument:

        • maxCacheEntries. The number of cached file entries. This argument must be an integer greater than zero.

        For example:

        SearchCellConfig.setMaxCacheEntries("1000")

      SearchCellConfig.setMaximumConcurrentDownloads(int maxConcurrentDownloads)

        Sets the maximum number of threads that perform file downloading on a Search server.

        This command takes a single argument that specifies the maximum number of threads. The argument must be an integer greater than zero. The default value is 3. The value of the maxConcurrentDownloads argument must not exceed the maximum number of threads set for the DefaultWorkManager Work Manager resources at the Search server scope.

        CAUTION: Increasing this value increases the load on the Files server.

        For example:

        SearchCellConfig.setMaximumConcurrentDownloads("10")

      SearchCellConfig.setMaximumTempDirSize(int maxTempDirSize)

        Sets the maximum size of a temporary directory used by a Search server for the files conversion process.

        This command takes a single argument that specifies the maximum size in bytes. The argument must be an integer greater than zero. The default value is 100 MB.

        Files are downloaded to a temporary directory, which is located in the index directory. The temporary directory size available must be greater than the maximum file size allowed for content indexing.

        For example:

        SearchCellConfig.setMaximumTempDirSize("51200")

      SearchCellConfig.setDownloadThrottle(long downloadThrottle)

        Sets the duration of a rest period between successive files downloads in a single file-download thread.

        This command takes a single argument that specifies the download throttle size in milliseconds. The download throttle is set to 500 by default.

        CAUTION: Increasing this value increases the load on the Files server.

        For example:

        SearchCellConfig.setDownloadThrottle("500")

  4. Check in the changed configuration property keys using the following wsadmin client command:

      SearchCellConfig.checkInConfig()

  5. To exit the wsadmin client, type exit at the prompt.

  6. Stop the server or servers hosting the Search application, delete the index, and then restart the Search servers.

      The next time the scheduled task runs, it recreates the index.

      Supported MIME types
      Search supports the indexing of content from a number of MIME types.


Parent topic

Manage the Search index

Related reference
SearchCellConfig commands


   

 

});