Crawl web content with search seedlists
Portal Search supports the use of seedlists to make crawling websites and their metadata more efficient and to provide content owners fine-grained control over how content and metadata are crawled. We can configure the portal to use seedlist support when crawling content generated with Web Content Manager.
By default Portal Search is configured to use seedlist format 1.0 when indexing content for search collections. When used with web content, seedlist format 1.0 makes it possible to use the web content page type to render content found in the search results on the corresponding web content page. We can also include custom metadata fields from a web content item that will appear in the search seedlist but not in the HTML source.
Search seedlist 1.0 can make access control information available in a way that makes pre-filtering of contents possible. Pre-filtering provides the fastest filtering approach because it takes place in the search index level. An additional advantage of pre-filtering is that remote secured content sources can be searched from the portal. The filtering mode is defined as part of the search service configuration parameters.
Support for generic seedlist 1.0 crawling is only available with IBM OmniFind Enterprise Edition Version 9.1 and later.
- Use the search seedlist 1.0 format
As of version 6.1.5, Portal Search is configured to support the Web Content Manager search seedlist 1.0 format by default. Versions before 6.1.5 use Web Content Manager search seedlist format 0.9.
- Use the search seedlist 0.9
Although Portal Search is configured to support the search seedlist 1.0 format by default, we can reconfigure the portal to use the standard seedlist 0.9 format when searching for web content with the Search Center. For example, we might choose to use seedlist format 0.9 because to make use of older search collections or because you retrieve the seedlist 0.9 contents using the seedlist URL, which uses a different syntax from the URL used with the search seedlist 1.0 format.
Parent: Enable search for web contentPrevious topic: Configure Search Center to search for web content