Client identification for search of the portal by external search engines
For the portal to recognize external search engines,
portal provides a client that covers several popular search engines. This
client is implemented according to the Composite Capability/Preference Profiles
(CC/PP) standard. It has the capability HTML_SEARCH set.
If you want to add more search engines, you can configure the client as required.
The
client has been implemented with the following settings:
- User agent:
- (.*(B|b)ot.*)|(.*BOT.*)|(.*(S|s)pider.*)|(.*(S|s)earch.*)|(.*(C|c)rawl(er)?.*)|(.*(G|g)rabber.*)|(.*(Y|y)ahoo.*)|(.*(S|s)lurp.*)|(.*Lycos.*)|(.*Wget.*)
This user agent covers most available large search engines, such as Google,
Yahoo!, Lycos, or MSN. This pattern list also accommodates all other search
engines that include segments of bot, spider, search, or crawler.
- Capability:
- For each search engine that you want to be able to crawl your portal,
you need to set the capability HTML_SEARCH. Search engines
usually visit a Web site twice, the first time to crawl the site, and the
second time to validate the content. When a search engine visits a site for
the second time, it usually does so by using a normal browser. Therefore enter
additional capabilities for supporting the different browser settings. Examples: (HTML_4_0,
HTML_IFRAME, HTML_FRAME, HTML_NESTED_TABLE, HTML_2_0, HTML_JAVASCRIPT, HTML_3_2,
HTML_3_0, HTML_CSS, HTML_TABLE).
- Manufacturer:
- Search
- Markup:
- HTML
If you want to include search engines that are not covered
by the default set, you can do so by either of the following ways:
Notes:
- The search mechanism works correctly for the portal only if the search
engine robots are identified to the portal in advance.
- Search on your portal by external search engines requires additional configuration
beyond client identification. For more details about this refer to Search by external search services.
Adding search engines by using the administration
portlet Manage Clients
To add search engines by using the administration
portlet Manage Clients, proceed as follows:
- Navigate to the Manage Clients portlet by clicking . Portal opens the Manage
Clients portlet.
- Depending on whether you want to add more search clients to the default
user agent or add a complete new client, perform one of the following steps:
- Select the client that starts with (.*(B|b)ot.*)|(.*BOT.*)|(.*(S|s)pider.*)
. . . from the list of clients and edit it. Use this option if you
simply want to add one or more search engines.
- Add a new client. For example, you can use this option, if you want to
give the newly added search engine priority by setting it to the First position
in the client list.
For details about how to do this refer to the Manage Clients portlet
help.
- Update or fill the fields and select the options as required:
- User agent:
- Type your new search engine user agent.
- Markup:
- html
- Manufacturer
- Search Engine Manufacturer. This field is optional.
- Capability:
- HTML_SEARCH, HTML_4_0, HTML_IFRAME, HTML_FRAME, HTML_NESTED_TABLE,
HTML_2_0, HTML_JAVASCRIPT, HTML_3_2, HTML_3_0, HTML_CSS, HTML_TABLE
- Position:
- First. Set the specified search engine to the first position, so
that it is correctly recognized. The reason for this is that the pattern matching
for the comparison of the user agents to the supported clients is done from
concrete and specific to general.
For a more detailed description of the fields and options refer
to the Manage Clients portlet help.
- Click OK to save your changes.
Adding search engines by using the XML configuration
interface
To add search engines by using the XML configuration interface
import them by an XML script file. For
more information refer to The XML configuration interface. To make sure that the search mechanism works correctly, you need
to add the capability HTML_SEARCH. An example XML script
is:<?xml version="1.0" encoding="UTF-8"?>
<request xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="PortalConfig_1.4.xsd"
type="update" create-oids="true">
<portal action="locate">
<client action="update" uniquename="wps.client.search.Your Search Engine Name"
manufacturer="Your Search Engine Manufacturer" markup="html">
<useragent-pattern>Your User-Agent Pattern</useragent-pattern>
<client-capability update="set">HTML_SEARCH</client-capability>
<client-capability update="set">HTML_4_0</client-capability>
<client-capability update="set">HTML_IFRAME</client-capability>
<client-capability update="set">HTML_FRAME</client-capability>
<client-capability update="set">HTML_NESTED_TABLE</client-capability>
<client-capability update="set">HTML_2_0</client-capability>
<client-capability update="set">HTML_JAVASCRIPT</client-capability>
<client-capability update="set">HTML_3_2</client-capability>
<client-capability update="set">HTML_3_0</client-capability>
<client-capability update="set">HTML_CSS</client-capability>
<client-capability update="set">HTML_TABLE</client-capability>
</client>
</portal>
</request>
Parent topic: Search by external search services
Related concepts
The XML configuration interface
Related tasks
Configuring the Site Map portlet for search by external search engines
|
|
|