Use the search seedlist 1.0 format
As of version 6.1.5, Portal Search is configured to support the IBM Web Content Manager search seedlist 1.0 format by default. Versions before 6.1.5 use Web Content Manager search seedlist format 0.9.
Search seedlist 1.0 provides several features:
- We can use the web content page type to render content found in the search results on the corresponding web content page.
- We can include custom metadata fields from a web content item that appear in the search seedlist but not in the HTML source.
- We can search within a specific library or site area, across all web content libraries, or across a list of libraries.
- We can perform incremental crawling of libraries for faster seedlist processing. With incremental crawling, when a crawl requests new items, only items that have been added, changed, or deleted since the previous crawl are retrieved.
Important: The syntax of the seedlist URL has changed with seedlist format 1.0. Older search collections created using seedlist format 0.9 cannot be reused or migrated to the new format. Be sure that you index all the content again after updating the Web Content Manager seedlist format from 0.9 to 1.0.
Search seedlist 1.0 can make access control information available in a way that makes pre-filtering of contents possible. Pre-filtering provides the fastest filtering approach because it takes place in the search index level. An additional advantage of pre-filtering is that remote secured content sources can be searched from the portal. The filtering mode is defined as part of the search service configuration parameters.
- Enable support for search seedlist 1.0
To use Portal Search to crawl the web content and leverage features like web content pages enable seedlist 1.0 support for the Portal Search crawler.
- Use the custom metadata field search support
With the search seedlist 1.0 support, custom metadata fields specified on content items are added to the search seedlist as metadata information, without requiring the metadata to appear in the HTML source for the content items.
- Seedlist 1.0 REST service API
The Web Content Manager API for retrieving application content through a seedlist is based on the REST architecture style. To obtain seedlist content, third party crawlers or administrator applications need to construct and send only HTTP requests to the application servlet.