Client identification for search of the portal by external search engines
For the portal to recognize external search engines, portal provides a client that covers several popular search engines. This client is implemented according to the Composite Capability/Preference Profiles (CC/PP) standard. It has the capability HTML_SEARCH set. To add more search engines, we can configure the client as required.
The client has been implemented with the following settings:
- User agent:
- (.*(B|b)ot.*)|(.*BOT.*)|(.*(S|s)pider.*)|(.*(S|s)earch.*)|(.*(C|c)rawl(er)?.*)|(.*(G|g)rabber.*)|(.*(Y|y)ahoo.*)|(.*(S|s)lurp.*)|(.*Lycos.*)|(.*Wget.*)
This user agent covers most available large search engines, such as Google, Yahoo!, Lycos, or MSN. This pattern list also accommodates all other search engines that include segments of bot, spider, search, or crawler.
- Capability:
- For each search engine to be able to crawl the portal, we need to set the capability HTML_SEARCH. Search engines usually visit a website twice, the first time to crawl the site, and the second time to validate the content. When a search engine visits a site for the second time, it usually does so using a normal browser. Therefore enter additional capabilities for supporting the different browser settings. Examples: (HTML_4_0, HTML_IFRAME, HTML_FRAME, HTML_NESTED_TABLE, HTML_2_0, HTML_JAVASCRIPT, HTML_3_2, HTML_3_0, HTML_CSS, HTML_TABLE).
- Manufacturer:
- Search
- Markup:
- HTML
To include search engines that are not covered by the default set, we can do so using either the administration portlet Manage Clients or xmlaccess.sh. For more information see the following topics.
- The search mechanism works correctly for the portal only if the search engine robots are identified to the portal in advance.
- Search on the portal by external search engines requires additional configuration beyond client identification. For more details about this see the topics about Configure the portal site for search by external search services and Configure the Search Sitemap portlet for search by external search engines.
- Add search engines using the administration portlet Manage Clients
To add search engines using the administration portlet Manage Clients, follow the procedure given here.
- Add search engines using xmlaccess.sh
To add search engines using xmlaccess.sh, you import them by an XML script file. To verify the search mechanism works correctly, add the capability HTML_SEARCH.
Parent Configure the portal site for search by internet search engines