Client identification for search of the portal by external search engines

 

+

Search Tips   |   Advanced Search

 

For the portal to recognize external search engines, portal provides a client that covers several popular search engines. This client is implemented according to the Composite Capability/Preference Profiles (CC/PP) standard. It has the capability HTML_SEARCH set. To add more search engines, you can configure the client as required.

The client has been implemented with the following settings:

User agent:

(.*(B|b)ot.*)|(.*BOT.*)|(.*(S|s)pider.*)|(.*(S|s)earch.*)|(.*(C|c)rawl(er)?.*)|(.*(G|g)rabber.*)|(.*(Y|y)ahoo.*)|(.*(S|s)lurp.*)|(.*Lycos.*)|(.*Wget.*)

Capability:

For each search engine that you want to be able to crawl the portal, set the capability HTML_SEARCH. Search engines usually visit a Web site twice, the first time to crawl the site, and the second time to validate the content. When a search engine visits a site for the second time, it usually does so by using a normal browser. Therefore enter additional capabilities for supporting the different browser settings.

Examples:

(HTML_4_0, HTML_IFRAME, HTML_FRAME, HTML_NESTED_TABLE, HTML_2_0, HTML_JAVASCRIPT, HTML_3_2, HTML_3_0, HTML_CSS, HTML_TABLE).

Manufacturer:

Search

Markup:

HTML

Include search engines not covered by the default set using either of the following methods...

The search mechanism works correctly for the portal only if the search engine robots are identified to the portal in advance.

Search on the portal by external search engines requires additional configuration beyond client identification.


Add search engines using Manage Clients portlet

To add search engines by using the administration portlet Manage Clients...

  1. Navigate to the Manage Clients portlet...

  2. Depending on whether you want to add more search clients to the default user agent or add a complete new client, perform one of the following steps:

    • To add one or more search engines, select the client that starts with...

        (.*(B|b)ot.*)|(.*BOT.*)|(.*(S|s)pider.*)

      ...from the list of clients and edit it.

    • To add a new client, fill in the fields...

        User agent Your new search engine user agent.
        Markup Manufacturer Engine Manufacturer. This field is optional.
        Capability HTML_4_0, HTML_IFRAME, HTML_FRAME, HTML_NESTED_TABLE, HTML_2_0, HTML_JAVASCRIPT, HTML_3_2, HTML_3_0, HTML_CSS, HTML_TABLE
        Position First.

        Set the specified search engine to the first position, so that it is correctly recognized. The reason for this is that the pattern matching for the comparison of the user agents to the supported clients is done from concrete and specific to general.

    • Click OK to save your changes.


Add search engines by using the XML configuration interface

Add search engines by using the XML configuration interface to import them using an XML script file.

To make sure that the search mechanism works correctly, add the capability HTML_SEARCH.

For example...

<?xml version="1.0" 
      encoding="UTF-8"?>

<request xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:noNamespaceSchemaLocation="PortalConfig_1.4.xsd"
         type="update" 
         create-oids="true">

    <portal action="locate">

        <client action="update" 
                uniquename="wps.client.search.Your Search Engine Name" 
                manufacturer="Your Search Engine Manufacturer" 
                markup="html">

            <useragent-pattern>Your User-Agent Pattern</useragent-pattern>

            <client-capability update="set">HTML_SEARCH</client-capability>
            <client-capability update="set">HTML_4_0</client-capability>
            <client-capability update="set">HTML_IFRAME</client-capability>
            <client-capability update="set">HTML_FRAME</client-capability>
            <client-capability update="set">HTML_NESTED_TABLE</client-capability>
            <client-capability update="set">HTML_2_0</client-capability>
            <client-capability update="set">HTML_JAVASCRIPT</client-capability>
            <client-capability update="set">HTML_3_2</client-capability>
            <client-capability update="set">HTML_3_0</client-capability>
            <client-capability update="set">HTML_CSS</client-capability>
            <client-capability update="set">HTML_TABLE</client-capability>

        </client>
    </portal>
</request>


Parent topic:

Search by external search services


Related concepts


The XML configuration interface
The XML configuration interface
Configure the Site Map portlet for search by external search engines