Configure file attachment indexing settings
Overview
When a file indexing task is run, the document conversion service...
- downloads files
- converts them to plain text
- indexes the content
During this process, content from different MIME types is indexed.
Connections supports indexing of file attachment content from Files and Wikis applications. Content from file attachments in Activities, Blogs, and Forums is not searched.
When file indexing is enabled, the content of files is not indexed the first time the index is run. The first index starts the process of retrieving the file content, but the actual indexing of the content only takes place when the index is run for the second time.
To configure file attachment indexing settings
- Check out search-config.xml :
cd app_server_root/profiles/Dmgr01/bin
./wsadmin.sh -lang jython
execfile("searchAdmin.py")
SearchCellConfig.checkOutConfig("working_dir", "cellName")To determine cell: print AdminControl.getCell()
- Control the file content indexing process.
SearchCellConfig.enableAttachmentHandling() Enable the indexing of file attachments in the Files and Wikis applications. If attachment handling of files was disabled during the last indexing, rebuild the index again after re-enabling.
SearchCellConfig.disableAttachmentHandling() Disable the indexing of file content in the Files, Wikis, and Library (ECM Files) applications. SearchCellConfig.setMaximumAttachmentSize(int maxAttachmentSize) File size limit. Default is 50 MB. Files under size limit are downloaded to a temporary directory located in the index directory, where they go through the text extraction process. The temporary directory size must be greater than the maximum file size allowed for content indexing. This command accepts one argument, maxAttachmentSize, which is the maximum file size in bytes. Example: SearchCellConfig.setMaximumAttachmentSize("52428800")
SearchCellConfig.setMaximumConcurrentDownloads(int maxConcurrentDownloads) Maximum number of threads that perform file downloading on a Search server. Default is 3. Value must not exceed the maximum number of threads set for the DefaultWorkManager Work Manager resources at the Search server scope. Increases load on the Files server. Example:
SearchCellConfig.setMaximumConcurrentDownloads("10")
SearchCellConfig.setMaximumTempDirSize(int maxTempDirSize) Maximum size, in bytes, of a temporary directory used by a Search server for the files conversion process. Default value is 100 MB. Files are downloaded to a temporary directory, which is located in the index directory. The temporary directory size available must be greater than the maximum file size allowed for content indexing. Example:
SearchCellConfig.setMaximumTempDirSize("51200")
SearchCellConfig.setDownloadThrottle(long downloadThrottle) Download throttle size in milliseconds, which is duration of rest period between successive files downloads in a single file-download thread. Default is 500 by default. Increases the load on the Files server. Example: SearchCellConfig.setDownloadThrottle("500")
- Check in search-config.xml:
SearchCellConfig.checkInConfig()
exit
- Stop the server or servers hosting the Search application, and then restart the Search servers.
The next time the scheduled task runs, persisted seedlists are retained after indexing finishes.
Parent topic:
Manage the Search index
Related:
Supported MIME types
Reload the Search application
Verify file content extraction