Manage and administer Portal Search


Overview

To manage Portal Search, click...

.and then...


Search Services

Search Services represent separate instances of the search engine When creating a search collection, you select a search service to perform searches. A search service can be used for searching multiple search collections and multiple portal search services (distribute the search load over several nodes).

The default search services is: Portal Search Service which manages search collections that contain...

For a cluster portal environment set up a remote search service.

The HTTP crawler of the Portal Search Service does not support JavaScript. Text that is generated by JavaScript might not be available for search.

You can also create additional custom search services and add them to the portal.


Manage search services

The Search Services page lists the Search Services in the portal and their availability status. Tasks include...


Create a new search service

To create a new search service, click the "New Search Service" button and set...

Service name Name must be unique within the current portal or virtual portal. This field is mandatory.
Search service implementation Select from the drop-down menu.
Service parameters
Add a new service parameter If required, enter a new service parameter key and its value, and click the Add Parameter button. Manage Search refreshes the parameter list with the new parameter added.
Edit a parameter Locate that parameter in the list and click the Edit icon. Enter a new value for the parameter as required.
Delete a service parameter Locate parameter in the list and click the Delete icon.


Manage the collections of a search service

To manage the collections of a search service, click the name of that search service in the services list. The Search Collections page lists the search collections of the selected search service. You can now manage these search collections and their content sources.


Edit a search service

To edit a search service, locate that search service in the list and click the Edit icon. Manage Search displays the Edit Search Service page. Update the service data and select from the available options as required.


Delete a search service

To delete a search service, locate that search service in the list and click the Delete icon.


Search Collections and content sources

You can maintain search collections for...

Use the Search and Browse portlet for advanced searches on collections.

Content is retrieved for indexing through a crawler (robot) from the content sources. The search collection stores keywords and metadata, and maps them to their original source.

Searchable resources can be stored on the local portal server or on remote content sources. Content can be processed by the crawlers, if it is accessible through the HTTP protocol. For example, this can be portal pages, WCM content, and documents and content hosted by Web servers. The documents can be of different types, for example, editable text files, office suite documents, such as Microsoft and OpenOffice, or PDF files. In order to make documents available for search by users, make sure you perform the following tasks:

For more details about how to work with search collections and content sources, refer to the following sections:


Manage Search Collections

To manage search collections and their content sources, click Search Collections. Manage Search shows the Search Collections page. It lists the search collections in the portal, together with related information, such as the following:

From the Search Collections panel, select the following options or icons and perform the following tasks on search collections:


Create a search collection

To create a new search collection, proceed by the steps laid out in the following.

The parameters that you select here when you create the search collection cannot be changed later. Therefore plan well ahead and apply special care when you create a new search collection. To change parameters for a search collection, you have to create a new search collection and select the required parameters for it. You can then export the data from the old collection and import it into the new collection. For details about how to do this refer to Export a search collection.nd Import a search collection .

  1. Click New Collection. Manage Search displays the Create Collection panel.

      The parameters that you select here when you create the search collection cannot be changed later. To change parameters for a search collection, you have to create a new search collection and select the required parameters for it. You can then export the data from the old collection and import it into the new collection. For details about how to do this refer to Export a search collection.nd Import a search collection .

  2. Location of Collection. Use this entry field to type the directory path where you want the new search collection to be created and the related data to be saved. This field is mandatory as indicated by the red asterisk ( * ). The location of a collection is the directory in which the collection data is stored. It can be a full path or a path relative to the Collections Locations search service parameter. Depending on what you type, the search collection is created in the following location:

    • If you type a name of choice, the location for the new search collection is combined from the default directory for search locations and the name you type. Example: If you type my_collection_location, the new search collection is created under the directory wp_root/collections/my_collection_location . For details about the default directory for search collections and how you configure it refer to the Portal Search topic in the WebSphere Portal Information Center under Configuring the Manage Search portlet.

    • To create the search collection in a location that is different from the default search collection location, type the full directory location as required. The new search collection will be created under the directory location that you specified.

  3. Name of Collection. Use this entry field to type the name that you want to give to the new search collection. The name that you enter here will show for the search collection in the search collection list and in the hierarchy tree of available content sources when you select locations for scopes. If you do not enter a name, the location that you entered in the previous field is used as a name for the search collection.

  4. Description of Collection. Use this entry field to type a description for the new search collection. The description that you enter here will show for the search collection in the search collection list.

  5. Specify Collection Language. Use this pull-down selection list to select the required language for the search collection. The search collection and its index is optimized for this language. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword. Portal search uses this language for indexing if there is no language defined for the document. Select one of the Unspecified options in order to index documents without any stemming of the words.

    This setting is not overwritten when you import a search collection, for example, during the migration of a search collection. If you create the search collection for the purpose of migrating an existing search collection, fill this in to match the setting in the source collection that you want to migrate.

  6. Select Categorizer. Use this pull-down selection list to select the required categorizer for the search collection. Possible values are:

    • None.

    • User-Defined. This categorizer is rule-based.

  7. Select Summarizer. Use this pull-down selection list to select the required summarizer for the search collection. Possible values are:

      None

        No summary is generated for documents. If you select this option, the Search Center uses the description metadata from the document, if the document has one.


      Automatic

        An automatic summarizer is used.

  8. Remove common words from queries. The index of the search collection to filter out common words, mark the check box for this option. If you select this option, the indexer and the search will filter out common words from indexed documents and search strings. Examples for English are: and, or, the, of, in, on.

      This setting is not overwritten when you import a search collection, for example, during the migration of a search collection. If you create the search collection for the purpose of migrating an existing search collection, fill this in to match the setting in the source collection that you want to migrate.

  9. Click OK to save updates, or click Cancel if you do not want to save the updates.

  10. Manage Search returns to the previous panel. If you clicked OK, the Search Collection list shows the new search collection by the name that you specified. If you did not specify a name, the list shows the directory path location that you specified.


View the status of a search collection

To view the status of the search collection, click the collection name in the list of search collections. Manage Search shows the Content Sources and the Search collection status information of the selected search collection. The status fields show the following data that changes over the lifetime of the search collection:


Search collection name:


Search collection location:


Collection description:


Search collection language:


Categorizer used:


Summarizer used:


Remove common words from queries:


Last update completed:


Next update scheduled:


Number of active documents:

To view updated status information about the search collection, click the Refresh button of the browser.

On the same panel you can also manage the content sources of the search collection.

If you have a faulty search collection in the portal, the portlet shows a link that takes you to that faulty collection.


Work with Pending Documents

By default an indexer crawl on a search collection makes the returned documents available for search by users.

To select and approve these documents before they are made available for search by users, remove the check mark from the option Add all documents to collection automatically for the content source under the Advanced Parameter tab when adding a new content source. Documents that result from a crawl on that content source are then moved to the Pending Documents box for approval. The documents are not indexed and cataloged until an administrator processes them in the Pending Documents panel.

The Pending Documents panel contains a list with all documents that the index crawler collected. This includes documents from all content sources defined for the selected search collection, except for those content sources for which the option Add all documents to collection automatically was enabled. In the Pending Documents panel you can edit and accept, or reject the documents individually. To perform these tasks, proceed as follows:

  1. Locate the search collection for which you want to accept or reject documents.

  2. Click the View Pending Documents icon next to that search collection. Manage Search displays the Pending Documents panel. If the list has more than one page of pending documents, use the arrows or the pull-down list to select other pages.

  3. To view a document, click the document title in the list. Manage Search displays the document in a new window, depending on whether the appropriate viewer for that document type is configured for the browser.

  4. To modify the information for a document, click Edit for the document which you want to modify. Manage Search displays the panel for editing the document information. This panel has two boxes. One shows the Document content (Read only) as it was returned by the crawler. The fields in this box are blocked. The other box is named Updated content. The fields in this box are empty. You enter the new information as required. You can modify the following:

    • The Title of the document.
    • The Author of the document.
    • The Subject of the document.
    • The Modification date, that is, the date when the document was last modified.
    • The Destination Categories of the document. You can add or remove categories associated with the document. This option is only available if a categorizer was selected when the collection was created.
    • The Description of the document.
    • The Keywords of the document.

      Proceed by the following steps:

      1. Enter updates as required.

      2. Click Copy to copy the data from Document Content to Updated Content. Use this option if you want to keep some of the document information and only make additions or minor changes to it. You can still overwrite the copied information under Updated Content.

        If you fill in one or more of the fields in the Updated Content and you click OK, all data under Document Content are overwritten by the data in the fields under Updated Content, even if some of these fields are left empty.

      3. Click OK

        Manage Search returns to the previous panel.

  5. Select Accept for the documents that you want to make available to users for search.

      Select Reject for the documents that you do not want to make available to users for search.

  6. Click Reset to cancel selection and return to the original state of the Pending Documents panel. Clicking Reset works only if you have not clicked Apply yet after you made selection.

  7. Click Accept All to accept all listed documents.

  8. Click Reject All to reject all listed documents.

  9. Click Apply to make selections become effective. Manage Search enters the documents you accept into the system, and indexes and catalogues them. Manage Search discards the documents you reject. Once you click Apply, you cannot use Reset to reset the list of documents.

  10. Click Refresh to refresh the list of pending documents. This updates the list with the new documents that came in while you were working on the pending documents.

  11. Click the appropriate link in the bread crumb trail at the top of the portlet to return to the list of search collections.

If a document is changed on its original content source, for example on the HTTP server where it is stored, it will appear again under Pending Documents after the next crawl. You can then modify, accept, or reject that document again from the Pending Documents panel.


Search and browsing a Search Collection

To browse a search collection proceed as follows:

  1. Locate the search collection which you want to browse.

  2. Click the Search and Browse Collection icon for that collection. The Browse Documents panel is displayed.

From the Browse Documents panel you can browse through the entire search collection. If a collection is associated with a category tree, you can navigate the tree and see which documents are associated with each category. You can also delete documents and edit the metadata associated with documents as in the Pending Documents panel. For more information about these operations refer to Work with pending documents. Use the Search feature to perform a search on the collection. To return to the list of collections, click the appropriate link in the bread crumb trail at the top of the portlet.


Migrate search collections


Notes:

When you upgrade to a higher version of WebSphere Portal, the data storage format is not necessarily compatible with the older version. To prevent loss of data, export all data of search collections to XML files before upgrading. After the upgrade you create a new search collection and use the previously exported data to import the search collection data back into upgraded portal.

  1. If you do not perform these steps, the search collections are lost after you upgrade WebSphere Portal.

  2. When you create the search collection on the upgraded portal, type data and make selections as follows:

    • Fill the location, the name, and the description of the new collection in as required. You can match the old settings or type new ones.

    • For Remove common words from queries and Specify Collection Language: Select these settings to match the settings of the old search collection.

        These settings are not overwritten when you import a search collection, for example, during the migration of a search collection. If you create the search collection for the purpose of migrating an existing search collection, select these to match the setting in the source collection that you want to migrate.

    • You do not need to select a categorizer and summarizer. These settings are overwritten by the settings when importing the data from the source search collection.

  3. You cannot migrate a portal site collection between different versions of Web Sphere Portal. If you upgrade the portal from one version to another, you need to re-create the portal site collection. Proceed as follows:

    1. Document the configuration data of portal site content source.

    2. Delete the existing portal content source.

    3. Upgrade the portal.

    4. On the upgraded portal create a new portal site content source. Use the documented configuration data as required.

    5. Execute the new portal content source.

Portlets that were crawled in the portal before the upgrade, but do not exist in the upgraded portal, are not returned by a search.

For more detailed information about these tasks refer to the topics about migrating, importing, and exporting search collections in the portal Information Center.

For details about how to export and import search collections refer to Export a search collection.nd Import a search collection .


Export a search collection

To export a search collection and its data, proceed as follows:

  1. Before you export a collection, verify the portal application process has write access to the target directory location. Otherwise you might get an error message, such as File not found.

  2. Make sure that the target directory is empty or contains no files that you still need, as the export can overwrite files in that directory.

  3. Locate the search collection that you want to export.

  4. Click the Import or Export Collection icon next to the search collection in the list. Manage Search displays the Import and Export Search Collection panel.

  5. In the entry field Specify Location (full path with XML extension): type the full directory path and XML file name to which you want to export the search collection and its data. Document the names of the collections and the directory locations and target file names to which you export the collections for the import that follows.

      When you specify the target directory location for the export, be aware that the export can overwrite files in that directory.

  6. Click Export to export the search collection data. Manage Search writes the complete search collection data to an XML file and stores it in the directory location that you specified. You can use this file later as the source of an import operation to import the search collection into another portal.

  7. To return to the previous panel without exporting the search collection, click the appropriate link in the bread crumb trail at the top of the portlet.


Import a search collection

To import the data of a search collection, proceed as follows:

  1. Before you can import the collection data, you need to create the empty shell for the search collection. You do this by creating a search collection. You only need to fill in the mandatory data entry field Location of Collection. Do not add content sources or documents, as that will be completed by the import.

  2. On the search collection list locate the search collection into which you want to import the search collection data.

  3. Click the Import or Export icon next to the search collection in the list. Manage Search displays the Import and Export Search Collection panel.

  4. In the entry field Specify Location (full path with XML extension): type the full directory path and XML file name of the search collection data which you want to import into the selected search collection.

  5. Click Import to import the search collection data. Manage Search imports the complete search collection data from the specified XML file into the selected search collection.

  6. To return to the previous panel without importing a search collection, click the appropriate link in the bread crumb trail at the top of the portlet.

  7. If required, you can now add content sources and documents to the search collection.

When importing a collection, be aware of the following:

  1. Import collection data only into an empty collection. Do not import collection data into a target collection that has content sources or documents already.

  2. When you import collection data into a collection, all collection settings are overwritten by possibly imported settings. For example, the language setting is overwritten, or a user-defined categorizer is added, if it was specified for the imported search collection.

  3. When you import a collection, a background process fetches, crawls, and indexes all documents that are listed by URL in the previously exported file. This process is asynchronous. It can therefore take considerable time until the documents become available.

  4. When you import a collection that contains a portal site content source created in a previous version of WebSphere Portal, you need to regather the portal content by deleting the existing portal site content source, creating a new portal site content source, and starting a crawl on it.


Refreshing collection data

Refreshing the data of a search collection updates that collection by renewed crawling of all the content sources that are associated with it. To refresh a search collection, click the icon Refresh Collection Data (regathering) for that collection. Manage Search performs complete new crawls over all its content sources. To verify progress and completion of the regathering, click the collection and view the Collection Status information. This might require a considerable amount of system resources, as all content sources of the search collection are crawled at the same time.


Add documents to a search collection

You can manually add documents to a search collection...

  1. Click the Add Document icon for that collection. Manage Search displays the Add Document panel.

  2. Select whether you want to load the document by File or by URL.

  3. Enter the location of the document that you want to add to the search collection:

    • For a file enter the directory location and file name in the entry field Specify file location:. Use the Browse button if required.

    • For a Web document enter the URL in the entry field for URL.

  4. Click Continue. Manage Search displays the panel for editing the document information.

  5. Update the document location, depending on whether you selected File or URL in the previous panel:

    • For content specified by file location in the previous panel, the field Edit Document Information for URL - Update machine name and driver for this URL has a partial file location filled in, based on the file location that you entered as follows: file:// [machine name]/your_file_path/your_file_name . Update the contents of the field to a valid file location by which users can access the document. To do this, replace the string [machine name] by the name of the machine on which the document resides.

    • For content specified by URL in the previous panel, the field Edit Document Information for URL - Update machine name and driver for this URL has a document URL filled in, based on the URL that you entered. Update this URL as necessary to a valid URL by which users can access the document.

      The document that you add must be accessible to the crawler and to the users who will search the document. For example, a document specified by file location must be available in a public share, if you want anonymous users to be able to search it.

  6. The other fields and options under the Document Content tab are similar to those listed under Work with Pending Documents. Proceed as described there.

  7. If you are using a rule-based categorizer for the search collection to which you are adding the document, the panel shows a Destination Categories tab. Click this tab. Manage Search displays the panel for selecting destination categories. Select the categories to associate them with the document as required.

  8. Click OK. Manage Search adds the document to the collection, indexes it, and returns to the search collections list.

  9. Add further documents as required.


Delete a search collection

To delete a search collection, proceed as follows:

  1. Click the Delete icon for the search collection which you want to delete.

  2. Confirm that you want to delete the search collection by clicking OK. Manage Search deletes the search collection and removes it from the list. If you do not want to delete the collection, click Cancel.

If you delete the search collection before an upgrade to a higher version of WebSphere Portal, make sure you export the search collection for later import before you delete it. For details refer to Migrate search collections.


Manage the user-defined rule-based categorizer for a search collection

If you associated a search collection with a user-defined rule-based categorizer at creation time, you can define its categories and create filter rules per category. For details about this refer to Configure the Destination Categories. Performing category specific searches on search collections is only supported by the Search and Browse portlet.

Rules determine which documents are associated with categories. They control which of the documents that are fetched from the content sources enter the search collection, and to which categories they are assigned:

The categories defined per content source are a subset of the entire category tree. The category tree is arranged in a hierarchy. The tree starts with the Root category. All other categories stem from the Root category.

You can select categories for the content sources that you select for search scopes.

If you do not have the option Add all documents to collection automatically enabled, you can always change the automated association created by the system between a document and a category. You perform this change from the Pending Documents panel, before the document is indexed and cataloged.

To manage the categories for a search collection associated with a rule-based categorizer, proceed as follows:

  1. Locate the required search collection on the search collection list. This search collection needs to have a rule-based categorizer.

  2. Click the Manage Collection Taxonomy for that search collection. Manage Search displays the Manage Category Tree panel. It shows the following.

    • A Category Tree; it shows a hierarchical tree view of the categories. Categories with subcategories have a box. Click the box to collapse or expand that part of the tree hierarchy in the view.

    • A Manage Categories box; use this box to manage the categories for the taxonomy.

    • A Manage Category Rules box; use this box to manage the rules for the taxonomy.

  3. Proceed with one of the tasks described in the following:


Manage categories

To manage categories, click one of the categories that are shown in the Category Tree. Managing categories for the selected search collection comprises the following tasks:


Rename a category 


Delete a category 


Create a new category 


Manage category rules

Rules are applied as filters to documents when inserting them into a collection. There are two types of rules:


URL rule


Content rule

The Manage Category Rules box lists the rules that apply as filters to the selected category. Use the minus ( - ) and plus ( + ) signs to collapse and expand the filters table. You can perform the following tasks with Manage Category Rules:


Create a rule 


Associate a rule with a category


Dissociating a rule from a category 


Manage rules


Manage the content sources of a search collection

To work with the content sources of a search collection, click the collection name in the list of search collections. Manage Search lists the Content Sources and the Search collection status information of the selected search collection. A search collection can be configured to cover more than one content source. The list shows the following information for the listed content sources:

From the Content Sources panel, you can select the following options or icons and perform the following tasks on content sources:

On the same panel you can also view the status of the search collection.


Add a new content source

When you create a new content source for a search collection, that content source will be crawled and the search collection will be populated with documents from that content source. You can determine where the index will crawl and what kind of information it will fetch. To create a new content source for a search collection, proceed as follows:

  1. Click New Content Source in the Content Sources panel. Manage Search displays the panel named Create a New Content Source. The title bar also shows the search collection for which you create the content source.

  2. Select the type of the content source that you want to create from the pulldown list:

    • Web site. Select this option for all remote sites. This includes Web sites and remote portal sites. Note that only anonymous pages can be indexed and searched on remote portal sites.

    • Seedlist feed. Select this option if the crawler will use a seedlist as the content source for the collection.

    • Portal site. Select this option if the content source is local portal site.

    • WCM (Managed Web Content) site. To make a content source of this type available to Portal Search, you need to create it in the WCM Authoring portlet. You select the appropriate option to make it searchable and specify the search collection to which it belongs. When you have completed creating the Managed Web Content site, it will be listed among the content sources for the search collection that you specified. For more details about this refer to the WCM documentation.

      Your selection determines some of the entry fields and options that are available for creating the content source. For example, the option Obey Robots.txt under the tab Advanced Parameters is available only if you select Web site as the content source type.

  3. Select the tabs to configure various types of parameters of the content source:

    1. Set the General Parameters
    2. Set the Advanced Parameters
    3. Configure the Schedulers
    4. Configure the Filters
    5. Configure Security
    6. Configure the Destination Categories. Available if you selected User-Defined Categorizer when creating the search collection)

  4. After you have set all required parameters, click Create to create the new content source with the parameters you have selected.

      Click Cancel if you do not want to create a new content source and save the updates.

  5. Manage Search takes you back to the main panel. If you clicked Create, it displays the new content source in the content source list, using the URL you gave as the content source location.


Set the general parameters for a content source

To set the general parameters for the content source, proceed by filling in the entry fields and making selections in the Create a New Content Source box. The available fields and options differ, depending on the type of content source that you select:

  1. Click the General Parameters tab.

  2. Content Source Name: Enter the name for the content source in this entry field.

  3. Collect documents linked from this URL: Type the required Web URL or portal URL in this entry field. This determines the root URL from which the crawler starts. This field is mandatory. For portal content sources, the value for this field is filled in by Manage Search.

    For Web sites, you need to type the full name including http://. For example: http://www.cnn.com. Typing only www.cnn.com will result in an error.

    A crawler failure can be caused by URL redirection problems. If this occurs, try by editing this field accordingly, for example, by changing the URL to the redirected URL.

  4. Make selection from the following options by selecting from the drop-down lists. The available fields and options differ, depending on the type of content source that you selected.

    Levels of links to follow Crawling depth, that is the maximum number of levels of nested links which the crawler will follow from the root URL while crawling.
    Number of linked documents to collect Maximum number of documents that will be indexed by the crawler during each crawling session. The number of indexed documents includes documents that are re-indexed as their content or category have changed.
    Stop collecting after (minutes) Maximum number of minutes the crawler may run in a single session. The timeout that you set here works as a approximate time limit. It might be exceeded by some percentage. Therefore allow some tolerance.
    Stop fetching document after (seconds) Maximum time limit in seconds for completing the initial phase of the HTTP connection, that is for receiving the HTTP headers. This time limit must be finite as it is used to prevent the crawler from getting stuck infinitely on a bad connection. However, it allows the crawler to fetch large files which take a long time to fetch, for example ZIP files.
    Links expire after (days) Days a document will be kept in the search collection since the last time it was found by a crawler. It is initialized for each document at the time the document is fetched by the crawler. This means that each time a crawler finds a document, the document is time stamped. This applies even if the crawler finds the document, but does not necessarily index it, for example, because it has not changed. In that case the time stamp of the document is still renewed.

    When the time stamp expires, the document is removed from the search collection at the time of the next cleanup. The cleanup demon is scheduled to run once a day.

    Remove broken links after (days): Days a document will be kept in the system after it becomes a "broken link". A document is considered to be a broken link if it is not found any more in a crawling session by any of the crawlers that previously found this document. In this case the crawler puts a time stamp on the document. When this time stamp expires, the document is removed from the search collection during the next cleanup. The cleanup demon is scheduled to run once a day.

    If all the content sources that previously contained this document are deleted from the system, then no crawler can determine that the document is a broken link. In this case the document is removed when its links expire.

  5. Click the next tab to set more parameters for the content source.


Set the advanced parameters for a content source

To set the advanced parameters for the content source, proceed as follows in the Create a New Content Source box:

  1. Click the Advanced Parameters tab.

  2. Make selection from the following options by selecting from the drop-down lists, marking the check boxes, or entering data as required:

      Number of parallel processes:

        This determines the number of threads the crawler uses in a crawling session.


      Default character encoding:

        This sets the default character set that the crawler uses if it cannot determine the character set of a document. The entry field for the Default character encoding contains the initial default value windows-1252, regardless of the setting for the Default Portal Language under Administration -> Portal Settings -> Global Settings. Enter the required default character encoding, depending on the portal language. Otherwise documents might be displayed incorrectly under Browse Documents.


      Always use default character encoding:

        If you check this option, the crawler always uses the default character set, regardless of the document character set. If you do not check this option, the crawler tries to determine the character sets of the documents.


      Add all documents to collection automatically:

        If you check this option, the crawler puts all documents directly in their destination folders and indexes them.

        If you do not check this option, the crawler puts all documents in the Pending Documents box. The documents are only put in their destination folders and indexed after an administrator manually approves them. For more information about Pending Documents and manual approval see Work with Pending Documents.


      Obey Robots.txt

        If you select this option, the crawler observes the restrictions specified in the file robots.txt when accessing URLs for documents. This option is only available if the content source type is Web site, Portal site, or Seedlist feed.


      Proxy server: and Port:

        The HTTP proxy server and port used by the crawler. If you leave this value empty, the crawler does not use a proxy server.


      Socks server: and Port:

        The socks server and port used by the crawler. If you leave this value empty, the crawler does not use a socks server.

  3. Click the next tab to set more parameters for the content source.


Configure the Schedulers

To configure a schedule, click the Schedulers tab. The Scheduler shows two boxes:

You can perform the following tasks with the Scheduler:


Add a schedule


Delete a schedule

After you have configured the scheduler, click the next tab to set more parameters for the content source.


Configure the Filters

The crawler filters control the crawler progress and the type of documents that are indexed and cataloged. To configure filters, click the Filters tab. You can define new filters in the Define Filter Rules box. The defined filters are listed in the Filtering Rules box.

Crawler filters are divided into the following two types:


URL filters


Type filters

If you define no filters at all, all documents from a content source will be fetched and crawled. If you define include filters, only those documents which pass the include filters are crawled and indexed. If you define exclude filters, they override the include filters, or, if you define no include filters, they limit the number of documents that are crawled and indexed. More specifically, if a document passes one of the include filters, but also passes one of the exclude filters, it is not crawled, indexed, or cataloged.


Create a filter

  1. Enter the filter name in the entry field Rule name:.

  2. Make the required selection from the following radio button options:

    • Apply rule while: Collecting documents or Add documents to index
    • Rule type: Include or Exclude
    • Rule basis: URL text or File Type.

  3. This step depends on selection for the rule basis in the previous step:

    • If you selected URL text as filter body type, enter the URL filter, for example */hr/*.
    • If you selected File Type as filter body type, select the required document type from the pull-down list.

    When you use the option Apply rule while Collecting documents with Rule type: Include, verify the URL in the field Collect documents linked from this URL: fits the specified rule; otherwise no documents will be collected. For instance, crawling the URL http://www.ibm.com/products with the URL filter */products/* will not give any results, because the rule has a trailing slash, but the URL does not. But either crawling http://www.ibm.com/products/ with the URL filter */products/* (both with trailing slash) or crawling http://www.ibm.com/products with the URL filter */products* (no trailing slash) will work.

  4. Click the Create icon in the Define Filter Rules box. The new filter appears in the appropriate list of filters. The filters are listed in separate boxes, depending on whether the filter was created as an include or exclude filter, and whether it was defined for crawling or indexing.

  5. Continue adding the filters that you need.

  6. To delete a filter from the list, select that filter, and click Delete.

After you have configured the filters, click the next tab to set more parameters for the content source.


Delete a filter

  1. Select the filter which you want to delete from the list.
  2. Click Delete. You get a prompt to confirm the deletion.
  3. Confirm that you want to delete the filter by clicking OK. The filter is removed from the list.

After you have configured the filters, click the next tab to set more parameters for the content source.


Configure security for a content source

You can configure the security for indexing secured content sources and repositories that require authentication. To configure the security for a content source, click the Security tab. Manage Search shows two boxes:

In the Define Security Realm box fill in the following data entry fields:

After you have filled in all required data, click the Create icon in the Define Security Realm box. The list in the Security Realms box now shows the security realm which you configured for the content source.

After you have configured security, click another tab to set more parameters for the content source. If you have set all required parameters and made all required updates, click Create to create the new content source with the parameters you have selected.


Configure the Destination Categories

Manage Search displays the Destination Categories tab only for search collections for which you selected a user-defined rule-based categorizer during creation. You can use this tab to associate categories with the content source that you are creating. If you do this, all documents that arrive from that content source are associated with the categories you selected, depending on whether they pass the existing filters. A category which is associated with a content source is also called a destination category.

The Destination Categories panel shows the Category Tree that you created by using the Category Tree option on the main Manage Search panel. The category nodes have check boxes next to them. You can select the categories that you want to associate with the content source by marking the check boxes. Categories with subcategories have small boxes. Click these boxes to collapse or expand parts of the tree hierarchy.

The category tree also has a pop up menu with the following options:

The Destination Categories panel also shows the Destination Category List box. It lists all categories that are associated with the content source. In the case of a large category tree, this list might give you a better overview of the selected categories. Click the plus ( + ) and minus ( - ) signs to expand and collapse the Destination Categories List.


Complete the creation of a content source

  1. After you have set all required parameters and made all required updates, click Create to create the new content source with the parameters you have selected. Click Cancel if you do not want to create a new content source and save the updates.

  2. Manage Search takes you back to the main panel. If you clicked Create, it displays the new content sources in the search collection list, using the URLs you gave as the content source locations.


Edit a content source

To edit a content source...

  1. Click Edit Content Source for the content source that you want to edit. Manage Search opens the Edit Content Source Configuration box. It looks just like the Create a New Content Source box, but shows the configuration data that you entered when creating the content source.

  2. Update the parameter options as required.

  3. When you have made all updates, click Save. Manage Search returns to the previous panel. All updates you made are now enabled.

  4. To return without saving updates, click Cancel.

If you modify a content source that belongs to a search scope, update the scope manually to verify the scope still covers that content source. Especially if you changed the name of the content source, edit the scope and verify it is still listed there. If not, add it again.


Delete a content source

To delete a content source...

  1. Click Delete Content Source for the content source that you want to delete. You get a prompt to confirm the deletion.

  2. Confirm that you want to delete the content source by clicking OK. The content source is removed from the content source list.

Documents that were collected from this content source will remain available for search by users under all scopes which included the content source before it was deleted. These documents will be available until their expiration time ends as specified under Links expire after (days):.


Start to collect documents from a content source

You can start an update from a content source manually. To do this...

  1. Click Start Crawler for the content source for which you want to start the update. This starts the crawl. Documents are fetched from this content source. If they are new or modified, they are updated in the search collection.

  2. To view the updated status information about the progress of the crawl process, click Refresh. The following status information is updated:

      Documents

        Shows how many documents the crawler has fetched so far from the selected content source.


      Run time

        Shows how much time the crawler has used so far to crawl the content source.


      Status

        Shows whether the crawler for the content source is running or idle.

To update the status information, click the Refresh icon.

You can also stop a running update of a content source manually. To do this...

  1. Locate the content source for which you want to stop the update from the content sources list. Make sure you select a content source for which the status information shows Running.

  2. Click Stop Collecting for that content source. This stops the crawl.


Verify the address of a content source

Use the option Verify Address to verify the URL address of a selected content source.

Locate the content source which you want to verify and click Verify Address for that content source. If the Web content source is available and not blocked by a robots.txt file, Manage Search returns the message Content Source is OK. If the content source is invalid, inaccessible, or blocked, Manage Search returns an error message.

When you create a new content source, Manage Search invokes the Verify Address feature.


Search Scopes and Custom Links

Search Scopes allows you to view and manage search scopes and custom links. The search scopes are displayed to end users as search options in the drop-down list of the search box in the banner and in the Search Center portlet. Users can select the scope relevant for their search queries. You can configure scopes by one of the following:

WebSphere Portal is shipped with these scopes:

All Sources Includes documents with all features from all content sources in the search by a user.
Managed Web Content Restricts the search to sites that were created by WCM.

You can add own custom search scopes. You can add an icon to each scope. Users will see this icon for the scope in the pull-down selection list of scopes.

You can also add new custom links to search locations. This includes links to external Web locations, such as Google or Yahoo. The Search Center global search lists the custom links for users in the selection menu of search options.


Manage Search Scopes and Custom Links

To manage search scopes and custom links, click Search Scopes. Manage Search shows the Search Scopes and Custom Links panel. It lists the search scopes and custom links and related information:

Select the following options or icons and perform the following tasks on search scopes and custom links:

Users have to clear their browser cache for changes to take effect, for example for a new scope to be available, or for the new default scope to be shown in the right position.


Create a new search scope

To create a new search scope, click the New Scope button. Manage Search displays the New Search Scope page. Enter the required data in the fields and select from the available options:


Scope Name:


Description:


Custom Icon URL:


Status:


Visible to anonymous users:


Query text (optional):


Select Features

  1. Click this button to select document features. Manage Search displays the Add Feature page.

  2. Select the feature(s) as required. These features will be applied as additional filters when users select this scope for their search.

  3. When you have completed selecting features, click OK to save these features to the new search scope. To return without saving, click Cancel.


Select Locations

  1. Click this button to select document locations. Manage Search displays the Add Locations page.

  2. Select the location(s) as required. Only documents from these search locations or content sources will be searched when users select this scope for their search.

  3. When you have completed selecting locations, click OK to save them to the new search scope. To return without saving, click Cancel.

In order to set names and descriptions for the search scope you have to create and save the scope first. Then locate the scope on the scopes list, and edit the scope by clicking the Edit ion. The option for setting names and descriptions in other locales is available only on the Edit Search Scope page. If you modify a content source that belongs to a search scope, update the scope manually to verify the scope still covers that content source. Especially if you changed the name of the content source, edit the scope and verify it is still listed there. If not, add it again.


Edit a search scope

To edit a search scope, locate that scope in the list and click the Edit icon for that scope. Manage Search displays the Edit Search Scope page. Update the scope data and select from the available options as required:


Scope name


I want to set names and descriptions.

For the other data entry fields and options, proceed as described under Create a new search scope.


Delete a search scope

To delete a search scope, locate that scope in the list and click the Delete icon for that scope. When the confirmation prompt appears, confirm by clicking OK, or click Cancel to return without deleting the search scope.


Add a new custom link

You can add Custom Links to allow users to do direct searches to popular Web search engines, such as Google or Yahoo. To add a new custom link, click the button New Custom Link. Manage Search displays the New Custom Link page. Enter the required data in the fields and select from the available options:


Status


Custom link name:


Link URL:


Custom icon URL:

When you have completed the data entry and selected the options as required, click OK to save the new custom link. To return without saving, click Cancel.

In order to set names and descriptions for the custom link you have to create and save the link first. Then locate the custom link on the list, and edit the link by clicking the Edit icon. The option for setting names and descriptions in other locales is available only on the Edit Custom Link page.


Edit a custom link

To edit a custom link, locate that custom link in the list and click the Edit icon. Manage Search displays the Edit Custom Link page. Update the custom link data and select from the available options as required. To set names for other locales, click I want to set names.


Delete a custom link 

To delete a custom link, locate that link in the list and click the Delete icon. When the confirmation prompt appears, confirm by clicking OK, or click Cancel to return without deleting the link.


Parent


Work with the search portlets


Search Center
Search and Browse

 


+

Search Tips   |   Advanced Search