Crawl web content with search seedlists
Portal Search supports the use of seedlists to make crawling websites and their metadata more efficient and to provide content owners fine-grained control over how content and metadata are crawled. You can configure the portal to use seedlist support when crawling content generated with IBM Web Content Manager.By default Portal Search is configured to use seedlist format 1.0 when indexing content for search collections. When used with web content, seedlist format 1.0 makes it possible to use the web content page type to render content found in the search results on the corresponding web content page. You can also include custom metadata fields from a web content item that will appear in the search seedlist but not in the HTML source.
Search seedlist 1.0 can make access control information available in a way that makes pre-filtering of contents possible. Pre-filtering provides the fastest filtering approach because it takes place in the search index level. An additional advantage of pre-filtering is that remote secured content sources can be searched from the portal. The filtering mode is defined as part of the search service configuration parameters.
Support for generic seedlist 1.0 crawling is only available with IBM Omnifind Enterprise Edition Version 9.1 and later.
Use the search seedlist 1.0 format
As of version 6.1.5, Portal Search is configured to support the IBM Web Content Manager search seedlist 1.0 format by default. Versions before 6.1.5 use Web Content Manager search seedlist format 0.9.
Although Portal Search is configured to support the search seedlist 1.0 format by default, you can reconfigure the portal to use the standard seedlist 0.9 format when searching for web content with the Search Center. For example, you might choose to use seedlist format 0.9 because you want to make use of older search collections or because you retrieve the seedlist 0.9 contents using the seedlist URL, which uses a different syntax from the URL used with the search seedlist 1.0 format.
Parent
Enable search for web content
Previous
Configure Search Center to search for web content
December 14, 2011
Apr 1, 2011 1:26:17 PM
});