Manage the content sources of a search collection

View information on how you manage the content sources of a search collection.

To work with content sources of a collection select Administration -> Search Administration -> Manage Search -> Search Collections. Then select a search collection by clicking the collection name link. Portal Search displays the Content Sources panel. It shows the status of the selected search collection and lists its content sources and their status. It shows information related to the individual content sources, and lets you perform tasks on these content sources.

You can select the following option icons and perform the following tasks in relation to the search collection which you selected from the Search Collections list:

  • Refresh. Use this option to update the list of content sources and the status shown for this collection.

  • Select the following option icons and perform the following tasks on a content source:

    • View Content Source Schedulers. Use this option to view and manage schedulers. This option is only available if you have defined schedulers for the content source.

    • Start crawler. Start collecting documents and thereby the crawling of a content source. Start an update of a content source by a new run of the crawler, or stop such an update. The timeout that you set under the General Parameters tab for crawling a content source works as a approximate time limit. It might be exceeded by some percentage. Therefore allow some tolerance.

    • Verify Address of Content Source. Use this option to verify that the URL of the content source is still live and available. Manage Search returns a message about the status of the content source.

    • Edit Content Source. Use this option to make changes to the content source, that is, configure parameters, schedules, categories, and filters for the selected content source.

      • It is of benefit to define a dedicated crawler user ID. The pre-configured default portal site search uses the default administrator user ID wpsadmin with the default password of that user ID for the crawler. If you changed the default administrator user ID during the portal installation, the crawler uses that default user ID. If you have made changes to the user ID or password for the administrative user ID and still want to use that user ID for the Portal Search crawler, you need to adapt the settings here accordingly.

          To define a crawler user ID, select the Security tab, and update the user ID and password. Click Save to save updates.

      • If you modify a content source that belongs to a search scope, update the scope manually to verify the scope still covers that content source. Especially if you changed the name of the content source, edit the scope and verify it is still listed there. If not, add it again.

  • Delete Content Source. Use this option to delete the content source.

    If you delete a content source, then the documents that were collected from this content source will remain available for search by users under all scopes which included the content source before it was deleted. These documents will be available until their expiration time ends. This expiration time can be specified under Links expire after (days): under General Parameters when you created the content source.

  • View information about the status and configuration of the content source.

      To update the status information, click the Refresh button or the refresh button of the browser.


      Search Collection Name:

        Shows the name of the selected search collection.

      Search Collection Location:

        Shows the location of the selected search collection in the file system. This is the full path where all data and related information of the search collection is stored.

      Collection Description:

        Shows the description of the selected search collection if available.

      Search Collection Language:

        Shows the language for which the search collection and its index are optimized. The index uses this language to analyze the documents when indexing, if no other language is specified for the document. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword.

      Categorizer used:

        Shows the categorizer used by the search collection.

      Summarizer used:

        Shows whether a static summarizer is enabled for this search collection.

      Remove common words from queries:

        Shows whether the indexer and the search filter out common words from documents, such as and, the, of.

        These words are also called stop words. The following words are filtered out for English: about all also am an and any are as at be been but by can de did do does for from had has have he her him his how if in into is it its may more my nbsp new no non not of on one or other our she so some than that the their then there these they this those thus to up us use was we were what when where which while why will with would you yours .


      Last update completed:

        Shows the date when a content source defined for the search collection was last updated by a scheduled update.

      Next update scheduled:

        Shows the date when the next update of a content source defined for the search collection is scheduled.

      Number of active documents:

        Shows the number of active documents in the search collection, that is, all documents that are available for search by users.


      Notes:

      1. To update the status information, click Refresh. Clicking the refresh button of the browser will not update the status information.

      2. If you delete a portlet from the portal after a crawl of the portal site, the deleted portlet is no longer listed in the search results. However, refreshing the view does not update the status information about the Number of active documents. This information is not updated until after the next cleanup run of portal resources.

    For more details about the available options for content sources, refer to the Manage Search portlet help.

      Apply filter rules

      Portal Search provides a facility for applying filter rules to the crawler process. The crawler filters control the crawler progress and the type of documents that are indexed and cataloged.


    Parent

    Set up search collections
    Set up JCR search collections
    Delayed cleanup of deleted portal pages


    Related tasks


    Export and import search collections
    Create search collections
    Hints and tips for using Portal Search

     


    +

    Search Tips   |   Advanced Search