Client identification for search of the portal by external search engines


Overview

For the portal to recognize external search engines, portal provides a client that covers several popular search engines. This client is implemented according to the Composite Capability/Preference Profiles (CC/PP) standard. It has the capability HTML_SEARCH set. To add more search engines, you can configure the client as required.

The client has been implemented with the following settings:
User agent:


Capability:


Manufacturer:


Markup:

To include search engines that are not covered by the default set, you can do so by either of the following ways:

The search mechanism works correctly for the portal only if the search engine robots are identified to the portal in advance.

  • Search on the portal by external search engines requires additional configuration beyond client identification.


    Add search engines by using the administration portlet Manage Clients

    To add search engines by using the administration portlet Manage Clients, proceed as follows:

    1. Navigate to the Manage Clients portlet by clicking...

        Main Menu | Administration | Portal Settings | Supported Clients

      Portal opens the Manage Clients portlet.

    2. Depending on whether you want to add more search clients to the default user agent or add a complete new client, perform one of the following steps:

      • Select the client that starts with (.*(B|b)ot.*)|(.*BOT.*)|(.*(S|s)pider.*) . . . from the list of clients and edit it. Use this option if you simply want to add one or more search engines.

      • Add a new client. For example, you can use this option, if you want to give the newly added search engine priority by setting it to the First position in the client list.

        For details about how to do this refer to the Manage Clients portlet help.

    3. Update or fill the fields and select the options as required:

        User agent:

          Type new search engine user agent.

        Markup:

          html

        Manufacturer

          Search Engine Manufacturer. This field is optional.

        Capability:

          HTML_SEARCH, HTML_4_0, HTML_IFRAME, HTML_FRAME, HTML_NESTED_TABLE, HTML_2_0, HTML_JAVASCRIPT, HTML_3_2, HTML_3_0, HTML_CSS, HTML_TABLE

        Position:

          First. Set the specified search engine to the first position, so that it is correctly recognized. The reason for this is that the pattern matching for the comparison of the user agents to the supported clients is done from concrete and specific to general.

        For a more detailed description of the fields and options refer to the Manage Clients portlet help.

    4. Click OK to save changes.


    Add search engines by using xmlaccess.sh

    To add search engines by using the XML configuration interface import them by an XML script file.

    To verify the search mechanism works correctly, you need to add the capability HTML_SEARCH. An example XML script is:

    <?xml version="1.0" encoding="UTF-8"?>
    <request    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                xsi:noNamespaceSchemaLocation="PortalConfig_1.4.xsd"
                type="update" create-oids="true">
        <portal action="locate">
     
            <client action="update" uniquename="wps.client.search.Your Search Engine Name" 
                    manufacturer="Your Search Engine Manufacturer" markup="html">
     
                <useragent-pattern>Your User-Agent Pattern</useragent-pattern>
     
                <client-capability update="set">HTML_SEARCH</client-capability>
                <client-capability update="set">HTML_4_0</client-capability>
                <client-capability update="set">HTML_IFRAME</client-capability>
                <client-capability update="set">HTML_FRAME</client-capability>
                <client-capability update="set">HTML_NESTED_TABLE</client-capability>
                <client-capability update="set">HTML_2_0</client-capability>
                <client-capability update="set">HTML_JAVASCRIPT</client-capability>
                <client-capability update="set">HTML_3_2</client-capability>
                <client-capability update="set">HTML_3_0</client-capability>
                <client-capability update="set">HTML_CSS</client-capability>
                <client-capability update="set">HTML_TABLE</client-capability>
     
            </client>
        </portal>
    </request>


    Parent

    Search by external search services
    xmlaccess.sh
    xmlaccess.sh


    Related tasks


    Configure the Site Map portlet for search by external search engines


    +

    Search Tips   |   Advanced Search