Configure a crawler to search the local portal site
Portal Search provides a default portal site search collection that enables the users to search the portal site. Before the users can search the portal site collection, do the following tasks.
- Set the crawler user ID.
Set a dedicated crawler user ID for crawling the portal site content source...
- Define the crawler user ID using the Manage Users and Groups portlet...
The pre-configured default portal site search uses the default administrator user ID, wpsadmin, with the default password of that user ID for the crawler. If we changed the default administrator user ID during the portal installation, the crawler uses that default user ID. If we changed the password for the wpsadmin or other administrative user ID, or if we changed the default administrator user ID to an ID other than wpsadmin, or to use a separate user ID, set the crawler user ID.
- Set the preferred language of the portal site crawler user ID to match the language of the portal site search collection that it crawls. If we do this task after starting a crawl on the portal site search collection, we must reset the portal site collection.
- Edit the portal site collection content source and enter the crawler user ID and its password...
- Click...
Administration | Search Administration | Manage Search | Search Collections | Search Collection | Edit icon | Security | security realm | Edit icon > Security
- Type the crawler user ID and password into the appropriate fields.
- Click Update.
- Click Save to save the changes.
- Optional: For content sources of type Web Site, we can configure the crawler to follow external links from inside the portal.
To do this task, modify the value in the field Levels of links to follow under the tab General Parameters. Set the level to a value higher than 1. In addition, we can configure filters for those external links from the Filters tab. The default filter suppresses all links that point back to portal pages. The default filter is displayed only after saving the configuration of the content source.
- Start the initial crawl....
Administration | Manage Search | Search Collections | Collection Name | Portal content source name | Start Crawler icon (right-pointing arrow)
- Configure regular crawls.
For regular crawls on the portal site content source, do either of the following tasks:
- Enable the default scheduler...
- Click the View Content Source Schedulers icon next to the collection name.
- In the Manage Schedulers page, click Disabled. This action changes the status of the scheduler to Enabled and displays a confirmation message.
- Set up our own scheduler. To do this task:
- Click the Edit icon for the content source.
We can have only one schedule at a time. Therefore, to create our own schedule, you first must delete the existing schedule.
- Select the Schedulers tab.
- Configure our own scheduler as needed.
- Click Save to save the changes.
Example
For more information about how to work with content sources, see Manage the content sources of a search collection and Manage Search portlet help.
- The local portal site is visible through a service that requires SSL. If the portal is configured with a web server and we configure the content source root URL through the web server, configure the web server for SSL.
- By default, items in the result lists from portal site searches provide no summary information. To have the summary information that is added, configure the portlet with the summary parameter enabled as follows: PortalCollectionSummarizer=on.
- When we crawl a portal site, be aware that a Portal Search crawl can use extended memory and time, depending on the Portal Search environment and configuration. For details, see the topic about Tips for Portal Search crawls.
- Do not change the default value of 1 for the option Levels of links to follow. Changing this value initiates web crawling logic and might result in unexpected results. For example, crawler might trigger unwanted in some of the administration portlets.
- Set the preferred language of the crawler user ID to match the language of the search collection that it crawls.
- The portal site search collection is created when an administrator goes to the Manage Search portlet. However, we must start the crawl for users to be able to search the portal site. Depending on the portal configuration and environment and possible customization, we might need to reset the portal site search collection.
- If the users search the portal site search collection on a secured portal site, refer to the topic Enable search on a secured portal site with the default configuration.
- The portal search crawler indexes static content pages and all pages that include portlets.
When users search a portal site, they can access portal pages of two types:
- The Public or anonymous portal pages are pages that users can view without authentication by user ID and password. The crawler can crawl public pages on the portal site on which it is located, or on a remote portal.
If we want anonymous users to be able to search the public pages of the portal site, see Enable anonymous users to search public pages of the portal.
- The secured portal pages are pages that users can view only if they authenticate themselves to the portal by logging in to the portal with a user ID and password. For details, see configuring search on a secured portal site.
We can crawl, index, and search secured portal pages only on the local portal installation. For security reasons, we cannot crawl secured pages of one portal site from another portal site.
If we customize search on the portal site, we might find useful information under the topics about configuring the default location for search collections and Resetting the default search collection.
If the portal site is multilingual and the users use different languages to search the portal, see the topic about Crawling a multilingual portal site.
Parent Search the local portalRelated tasks:
Reset the default search collection
Manage the content sources of a search collection
Enable search on a secured portal site with the default configuration
Enable anonymous users to search public pages of the portal
Configure search on a secured portal site
Configure the default location for search collections
Crawl a multilingual portal site
Tips for Portal Search crawls
Configure the web server plug-in for Secure Sockets Layer