+

Search Tips   |   Advanced Search

Manage Search

Use the Manage Search portlet to administer portal search.

    Administration | Portal User Interface | Manage Search


Search Services

Portal Search Service is the default. Search collection are associated with search services. One search service can be can be used for searching multiple search collections. We can set up multiple portal search services and distribute the search load over several nodes.

For a cluster, set up a remote search service.

The HTTP crawler of the Portal Search Service does not support JavaScript. Text generated by JavaScript might not be available for search.

We can create additional custom search services and add them to the portal.

    Manage search services

    To manage search services, click Search Services. Manage Search shows the Search Services page listing the Search Services in the portal, and their status. Options

    • New Search Service.
    • Click the name of a search service to work with the search collections using that search service.
    • Edit Service
    • Delete Service.

    Create a new search service

    Click the New Search Service button. Manage Search displays the New Search Service page. Options:

      Service name
      Name must be unique within the current portal or virtual portal. This field is mandatory.

      Search service implementation
      Select from drop-down menu

      Service parameters

        Add a new service parameter
        If required, enter a new service parameter key and its value, and click the Add Parameter button. Manage Search refreshes the parameter list with the new parameter added.

        Edit a parameter
        To edit a parameter:

        1. Locate that parameter in the list and click the Edit icon.
        2. Enter a new value for the parameter as required. (The Parameter Key field is blocked from updates.)
        3. Click OK to save the update, or click Cancel to return and keep the previous value.

        Delete a service parameter
        Locate hte parameter in the list and click the Delete icon. When the confirmation prompt shows, confirm by clicking OK, or click Cancel to return without deleting the service parameter.

When we complete the data entry and selection of options, click OK to save the new search service. To return without saving, click Cancel.

Manage the collections of a search service

To manage the collections of a search service, click the name of that search service in the services list. We can also select Search Collections from the main Manage Search portlet panel. Manage Search displays the Search Collections page. It lists the search collections of the selected search service. We can now manage these search collections and their content sources.

Edit a search service

To edit a search service, locate that search service in the list and click the Edit icon. Manage Search displays the Edit Search Service page. Update the service data and select from the available options as required:

    Service name
    Update the name for the search service as required. The name must be unique within the current portal or virtual portal.

Delete a search service
To delete a search service, locate that search service in the list and click the Delete icon. When the confirmation prompt shows, confirm by clicking OK, or click Cancel to return without deleting the search service.


Search Collections and content sources

Search Collections allows us to view and manage the search collections and their content sources in the portal. We can build and maintain search collections of web, WCM, and portal content. Users can then search these collections using the portal Search Center.

The default search collection combines two content sources and their related crawlers:

    Portal Content Source Local portal site. Search for pages and portlets.
    WCM Content Source Search for web content.

During the search collection build process, content is retrieved for indexing through a crawler (robot) from the content sources. The search collection stores keywords and metadata, and maps them to their original source. It allows fast processing of requests from the Search Center portlet.

Searchable resources can be stored on the local portal server or on remote content sources. Content can be processed by the crawlers, if it is accessible through the HTTP protocol. For example, this can be portal pages, WCM content, and documents and content hosted by web servers. The documents can be of different types, for example, editable text files, office suite documents, such as Microsoft and OpenOffice, or PDF files.


Manage Search Collections

To manage search collections and their content sources, click Search Collections. Manage Search shows the Search Collections page. It lists the search collections in the portal, together with related information, such as :

  • Name of the search collection
  • Description of the search collection, if available
  • Search service by which the collection is indexed and searched
  • Number of documents in the collection
  • Icons for doing tasks on the search collection.

From the Search Collections panel, select the following options or icons and do the following tasks on search collections:

  • Search Service.

    If we clicked Search Collections from the main Manage Search panel, the Search Collections panel lists all the search collections in the portal. To restrict the list to search collections of one search service, select that search service from the search services pull-down list. If we entered the search Collections panel by clicking a search service name in the list of search services, the list shows only the collections for that service. To view other collections, select the search service as required from the pull-down list.

  • New collection.

    Create a search collection.

  • Refresh.

    Refresh the list of search collections. This action updates the information and the available option icons for the collections. Examples:

    • If a crawl is running or was completed, the number of documents is updated.

    • If a crawl was completed on a collection since the last refresh, option icons can appear, such as Search and Browse the Collection.

    • If another administrator also worked on search collections at the same time, the information is updated .

  • Arrow icons. To go to a different page in the list of search collections, click the required arrow icon, or enter a page number in the page number entry field. Then, click the Go icon. Both options are available in the search collections list.

  • Click one of the links or icons for a specific search collection and do one of the following tasks.

    The icons for some tasks are only available if the current user can do the specific task on the search collection.

    • Click the collection name to view the status of the search collection and manage the content sources of the search collection.

    • Search and Browse the Collection.

      Click this icon to search and browse a search collection and to work with the documents of the selected collection. We can do the following administrative tasks:

      • Browse the documents of the selected collection.
      • Search the documents of the selected collection.
      • Edit the fields of the documents in the selected collection.
      • Delete documents from the selected collection.

    • Import or Export Collection.

      Click this icon to import or export the document and index data of the search collection. (Portal Search uses an internal XML interface) The export and import operations can be useful for migrating the search collections between different versions of WebSphere Portal.

    • Delete Collection.

      Click this icon to delete a search collection.


Create a search collection

The parameters selected here when creating the search collection cannot be changed later. Therefore, plan ahead and apply special care when creating a new search collection. To change parameters for a search collection, create a new search collection and select the required parameters for it. We can then export the data from the old collection and import it into the new collection.

  1. Click New Collection.

    Manage Search displays the Create Collection panel.

    The parameters selected here when creating the search collection cannot be changed later. To change parameters for a search collection, create a new search collection and select the required parameters for it. We can then export the data from the old collection and import it into the new collection.

  2. Location of Collection.

    Directory path for new search collection to be created, and the related data to be saved. Mandatory. The location of a collection is the directory in which the collection data is stored. It can be a full path or a path relative to the Collections Locations search service parameter. Depending on what you type, the search collection is created in the following location:

    • If we type a name of the choice, the location for the new search collection is combined from the default directory for search locations and the name you type.

      Example: If we type my_collection_location, the new search collection is created under the directory wp_root/collections/my_collection_location.

    • To create the search collection in a location that is different from the default search collection location, type the full directory location as required. The new search collection is created under the directory location specified.

  3. Name of Collection

    Use this entry field to type the name to give to the new search collection. The name entered here shows for the search collection in the search collection list and in the hierarchy tree of available content sources when we select locations for scopes. If we do not enter a name, the location entered in the previous field is used as a name for the search collection.

  4. Description of Collection.

    Use this entry field to type a description for the new search collection. The description entered here shows for the search collection in the search collection list.

  5. Specify Collection Language.

    Use this pull-down selection list to select the required language for the search collection. The search collection and its index are optimized for this language. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword. Portal search uses this language for indexing if no language is defined for the document.

    This setting is not overwritten when you import a search collection, for example, during the migration of a search collection. If we create the search collection for migrating an existing search collection, fill this selection to match the setting in the source collection to migrate.

  6. Select Summarizer.

    Use this pull-down selection list to select the required summarizer for the search collection. Possible values are as following :

      None
      No summary is generated for documents. If we select this option, the Search Center uses the description metadata from the document, if the document has one.

      Automatic
      An automatic summarizer is used.

  7. Manage Search returns to the previous panel.

    If we clicked OK, the Search Collection list shows the new search collection by the name specified. If we did not specify a name, the list shows the directory path location specified.


View the status of a search collection

To view the status of the search collection, click the collection name in the list of search collections. Manage Search shows the Content Sources and the Search collection status information of the selected search collection. The status fields show the following data that changes over the lifetime of the search collection:

    Search collection name: Name of the selected search collection. If we did not enter a name for the Search collection, the collection location is shown here instead.
    Search collection location: Location of the selected search collection in the file system. This location is the full path where all data and related information of the search collection is stored.
    Collection description: Description of the selected search collection, if available.
    Search collection language: Language for which the search collection and its index is optimized. The index uses this language to analyze the documents when indexing, if no other language is specified for the document. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword.
    Summarizer used: Whether a static summarizer is enabled for this search collection.
    Last update completed: Date when a content source defined for the search collection was last updated by a scheduled crawl and indexed. The timeout that we might set under Stop collecting after (minutes): works as an approximate time limit. It might be exceeded by some percentage, as indexing the documents after the crawl takes more time. Therefore, allow some tolerance.
    Next update scheduled: When the next update of the content source for the search collection is scheduled that is when the content source will be crawled again.
    Number of active documents: Number of active documents in the search collection, that is, all documents available for search by users.

To view updated status information about the search collection, click the Refresh button of the browser.

On the same panel, we can also manage the content sources of the search collection.

If we have a faulty search collection in the portal, the portlet shows a link that takes us to that faulty collection.


Search and browsing a Search Collection

To browse a search collection:

  1. Locate the search collection, which to browse.

  2. Click the Search and Browse Collection icon for that collection. The Browse Documents panel is displayed.

From the Browse Documents panel, we can browse through the entire search collection. We can view documents and their metadata. We can also delete documents. Use the Search feature to do a search on the collection. To return to the list of collections, click the appropriate link in the breadcrumb trail.


Migrate search collections

When we upgrade to a higher version of WebSphere Portal, the data storage format is not necessarily compatible with the older version. To prevent loss of data, export all data of search collections to XML files before you upgrade. After the upgrade, we create a new search collection and use the previously exported data to import the search collection data back into the upgraded portal.

  1. If we do not do these steps, the search collections are lost after you upgrade the WebSphere Portal.

  2. When creating the search collection on the upgraded portal, type data and make selections as follows:

    • Fill the location, the name, and the description of the new collection in as required. We can match the old settings or type new ones.

    • We do not need to select a summarizer. These settings are overwritten by the settings when you import the data from the source search collection.

  3. We cannot migrate a portal site collection between different versions of WebSphere Portal. If we upgrade the portal from one version to another, re-create the portal site collection. Proceed as follows:

    1. Document the configuration data of the portal site content source.

    2. Delete the existing portal content source.

    3. Upgrade the portal.

    4. On the upgraded portal, create a new portal site content source. Use the documented configuration data as required.

    5. Run the new portal content source.

Portlets that were crawled in the portal before the upgrade, but do not exist in the upgraded portal, are not returned by a search.


Export a search collection

To export a search collection and its data:

  1. Before we export a collection, verify the portal application process has write access to the target directory location. Otherwise, we might get an error message, such as File not found.

  2. Make sure the target directory is empty or contains no files that you still need, as the export can overwrite files in that directory.

  3. Locate the search collection to export.

  4. Click the Import or Export Collection icon next to the search collection in the list. Manage Search displays the Import and Export Search Collection panel.

  5. In the entry field...

      Specify Location (full path with XML extension):

    ...type the full directory path and XML file name to which to export the search collection and its data. Document the names of the collections and the directory locations and target file names to which we export the collections for the import that follows. When we specify the target directory location for the export, be aware the export can overwrite files in that directory.

  6. Click Export to export the search collection data.

    Manage Search writes the complete search collection data to an XML file and stores it in the directory location specified. Use this file later as the source of an import operation to import the search collection into another portal.

  7. To return to the previous panel without exporting the search collection, click the appropriate link in the breadcrumb trail.


Import a search collection

To import the data of a search collection:

  1. Before we can import the collection data, create the empty shell for the search collection. We can create the empty shell by creating a search collection. We need to enter only the mandatory data entry field Location of Collection. Do not add content sources or documents, as that is completed by the import.

  2. On the search collection list, locate the search collection into which to import the search collection data.

  3. Click the Import or Export icon next to the search collection in the list. Manage Search displays the Import and Export Search Collection panel.

  4. In the entry field...

      Specify Location (full path with XML extension):

    Type the full directory path and XML file name of the search collection data, which to import into the selected search collection.

  5. Click Import to import the search collection data. Manage Search imports the complete search collection data from the specified XML file into the selected search collection.

  6. To return to the previous panel without importing a search collection, click the appropriate link in the breadcrumb trail.

  7. If required, we can now add content sources and documents to the search collection.

When we import a collection, be aware of the following:

  1. Import collection data only into an empty collection. Do not import collection data into a target collection that has content sources or documents already.

  2. When we import collection data into a collection, all collection settings are overwritten by possibly imported settings. For example, the language setting is overwritten, or a summarizer is added, if it was specified for the imported search collection.

  3. When we import a collection, a background process fetches, crawls, and indexes all documents listed by URL in the previously exported file. This process is asynchronous. It can therefore take considerable time until the documents become available.

  4. When we import a collection containing a portal site content source created in a previous version of WebSphere Portal, regather the portal content. We can regather the content by deleting the existing portal site content source, creating a new portal site content source, and starting a crawl on it.


Refreshing collection data

Refreshing the data of a search collection updates that collection by renewed crawling of all the content sources associated with it. To refresh a search collection, click the icon Regather documents from Content Source for that collection. Manage Search does complete new crawls over all its content sources. To verify progress and completion of the regathering, click the collection and view the Collection Status information.

This action might require a considerable amount of system resources, as all content sources of the search collection are crawled at the same time.


Delete a search collection

To delete a search collection:

  1. Click the Delete icon for the search collection, which to delete.

  2. Confirm to delete the search collection by clicking OK. Manage Search deletes the search collection and removes it from the list. If we do not want to delete the collection, click Cancel.

If we delete the search collection before an upgrade to a higher version of WebSphere Portal, make sure we export the search collection for later import before deleting it.


Manage the content sources of a search collection

To work with the content sources of a search collection, click the collection name in the list of search collections. Manage Search lists the Content Sources and the Search collection status information of the selected search collection. A search collection can be configured to cover more than one content source. The list shows the following information for the listed content sources:

  • The name of the content source
  • Status information for the content source
  • The icons for doing tasks on the content sources.

From the Content Sources panel, we can select the following options or icons and do the following tasks on content sources:

  • Search collection:

    To change to the content sources of a different search collection and work with them, select the required search collection from this pull-down list.

  • New Content Source.

    Click this option to add a new content source to the search collection.

  • Refresh.

    Click this icon to refresh the status information about the content source. While a crawl on the content source is running, this option updates the information about the crawl run time and the documents collected so far.

  • View the status information for the content source:

      Documents
      The number of documents in the content source. If we click the Refresh button during a crawl, this action shows how many documents the crawler fetched so far from the content source. Run Time The Run Time of the last crawler run on the content sources. If we click the Refresh button during a crawl, this action shows how much time the crawler used so far to crawl the content source. Last Run The date and time when the Last Run started by which the content source was crawled. Next Run The date and time of the Next Run by which the content source is crawled, if scheduled. Status The Status of the content source, that is, whether the content source is idle or a crawl is Running on the content source.

  • Select one of the icons for a specific content source and do one of the following tasks:

    • View Content Source Schedulers.

      This icon is displayed only if we defined scheduled crawls for this content source. If we click this icon, the portlet lists the scheduled crawls, together with the following information:

      • Start Date
      • Start Time
      • Repeat Interval
      • Next Run Date
      • Next Run Time
      • Status. (disabled or enabled)

    • Start Crawler.

      Click this icon to start a crawl on the content source. This action updates the contents of the content source by a new run of the crawler. While a crawl on the content source is running, the icon changes to Stop Crawler. Click this icon to stop the crawl. Portal Search refreshes different content sources as follows:

      • For website content sources, documents that were indexed before and still exist in the content source are updated. Documents that were indexed before, but no longer exist in the content source are retained in the search collection. Documents that are new in the content source are indexed and added to the collection.

      • For WebSphere Portal sites, the crawl adds all pages and portlets of the portal to the content source. It deletes portlets and static pages from the content source that were removed from the portal. The crawl works similarly to the option Regather documents from Content Source.

      • For WCM sites, Portal Search uses an incremental crawling method. Additionally to added and updated content, the Seedlist explicitly specifies deleted content. In contrast, clicking Regather documents from Content Source starts a full crawl; it does not continue from the last session, and it is therefore not incremental.

      • For content sources created with the seedlist provider option, a crawl on a remote system that supports incremental crawling, such as IBM Connections, behaves like a crawl on a WCM site.

    • Regather documents from Content Source. This option deletes all existing documents in the content source from previous crawls and then starts a full crawl on the content source. Documents that were indexed before and still exist in the content source are updated. Documents that were indexed before, but no longer exist in the content source are removed from the collection. Documents that are new in the content source are indexed and added to the collection.

    • Verify Address of Content Source. Click this icon to verify the URL of the content source is still live and available. Manage Search returns a message about the status of the content source.

    • Edit Content Source.

      Click this icon to make changes to a content source. The changes include configuring parameters, schedules, and filters for the selected content source.

      • It is of benefit to define a dedicated crawler user ID. The pre-configured default portal site search uses the default administrator user ID wpsadmin with the default password of that user ID for the crawler. If we changed the default administrator user ID during the portal installation, the crawler uses that default user ID. If we changed the user ID or password for the admin ID and still want to use that user ID for the Portal Search crawler, we need to adapt the settings .

        To define a crawler user ID, select the Security tab, and update the user ID and password. Click Save to save the updates.

      • If we modify a content source that belongs to a search scope, update the scope manually to verify the scope still covers that content source. Especially if we changed the name of the content source, edit the scope and make sure that it is still listed there. If not, add it again.

    • Delete Content Source.

      Click this icon to delete the selected content source.

      If we delete a content source, then the documents that were collected from this content source remains available for search by users under all scopes, which included the content source before it was deleted. These documents are available until their expiration time ends. We can specify this expiration time under Links expire after (days): under General Parameters when creating the content source.

On the same panel, we can also view the status of the search collection.


Add new content source

When creating a new content source for a search collection, that content source is crawled and the search collection is populated with documents from that content source. We can determine where the index crawls and what information it fetches. To create a new content source for a search collection:

  1. Click New Content Source in the Content Sources panel.

    Manage Search displays the panel named Create a New Content Source. The title bar also shows the search collection for which we create the content source.

  2. Select the type of the content source to create from the pull-down list:

      Website. Select this option for all remote sites, which includes websites and remote portal sites. Only anonymous pages can be indexed and searched on remote portal sites.
      Seedlist provider. Select this option if the crawler uses a seedlist as the content source for the collection.
      Portal site. Select this option if the content source is the local portal site.
      WCM (Managed Web Content) site. To make a content source of this type available to Portal Search, create it in the WCM Authoring portlet. We select the appropriate option to make it searchable and specify the search collection to which it belongs. When we complete creating the Managed Web Content site, it is listed among the content sources for the search collection specified.

    Your selection determines some of the entry fields and options available for creating the content source. For example, the option Obey Robots.txt under the tab Advanced Parameters is available only if we select Website as the content source type.

  3. Select the tabs to configure various types of parameters of the content source:

    1. General Parameters
    2. Advanced Parameters
    3. Configure the Scheduler
    4. Configure the Filters
    5. Configure Security

  4. After setting all required parameters, click Create to create the new content source with the parameters selected.

    Click Cancel if we do not want to create a new content source and save the updates.

  5. Manage Search takes you back to the main panel. If we clicked Create, it displays the new content source in the content source list. It shows the content source under the name that you gave the content source, or, if specified no name, under its URL.


Set the general parameters for a content source

Available fields and options differ, depending on the type of content source selected. They are listed in the following. Data entry fields that are marked with a red asterisk ( * ) are mandatory.

  1. Click the General Parameters tab.

  2. Content Source Name:

    Enter the name for the content source in this entry field.

  3. Collect documents linked from this URL:

    Type the required web URL or portal URL in this entry field. This action determines the root URL from which the crawler starts. This field is mandatory. For portal content sources, the value for this field is completed by Manage Search.

    • For websites, we need to type the full name including http://. For example:

        http://www.cnn.com

      Typing only www.cnn.com results in an error.

    • A crawler failure can be caused by URL redirection problems. If this problem occurs, try by editing this field, for example, by changing the URL to the redirected URL.

  4. Make the selection from the following options by selecting from the drop-down lists.

    The available fields and options differ, depending on the type of content source selecteded.

      Levels of links to follow: Crawling depth that is the maximum number of levels of nested links, which the crawler follows from the root URL while it crawls.
      Number of linked documents to collect: Maximum number of documents indexed by the crawler during each crawling session. The number of indexed documents includes documents that are reindexed as their content changed.
      Stop collecting after (minutes): Maximum number of minutes the crawler might run in a single session for websites. The timeout set here works as an approximate time limit. It might be exceeded by some percentage. Therefore, allow some tolerance.
      Stop fetching document after (seconds): Time the crawler spends trying to fetch a document. This sets the maximum time limit in seconds for completing the initial phase of the HTTP connection that is for receiving the HTTP headers. This time limit must be finite as it is used to prevent the crawler from getting stuck infinitely on a bad connection. However, it allows the crawler to fetch large files, which take a long time to fetch, for example compressed files.

  5. Click the next tab to set more parameters for the content source.


Set the advanced parameters for a content source

To set the advanced parameters for the content source in the Create a New Content Source box:

  1. Click the Advanced Parameters tab.

  2. Make the selection from the following options by selecting from the drop-down lists, marking the check boxes, or entering data as required:

      Number of parallel processes: Number of threads the crawler uses in a crawling session.
      Default character encoding: Default character set the crawler uses if it cannot determine the character set of a document. The entry field for the Default character encoding contains the initial default value windows-1252, regardless of the setting for the Default Portal Language under Administration menu > Portal Settings > Global Settings. Enter the required default character encoding, depending on the portal language. Otherwise, documents might be displayed incorrectly under Browse Documents.
      Always use default character encoding: If checked, the crawler always uses the default character set, regardless of the document character set. If we do not check this option, the crawler tries to determine the character sets of the documents.
      Obey Robots.txt The crawler observes the restrictions specified in the file robots.txt when it accesses URLs for documents. This option is only available for content sources of type website. This option is not available with Portal site, or seedlist provider.
      Proxy server: and Port: The HTTP proxy server and port used by the crawler. If we leave this value empty, the crawler does not use a proxy server.

  3. Click the next tab to set more parameters for the content source.


Configure the Scheduler

To configure the schedule, click the Scheduler tab. The Scheduler shows two boxes:

    Define Schedule. Add new schedule.
    Scheduled Updates. Schedule at which crawls are done.

Scheduler tasks:

    Add the scheduler

    1. From the From: and At: drop-down menus, select the date and time when we want the crawler to run.

    2. Under, Update every: specify the interval at which we want the crawler to run. Type the number of time units and select the type of time unit, for example 2 and week(s) for a bi-weekly schedule.

    3. Click the Create icon in the Define Schedule box. The scheduler shows the newly created schedule in the Scheduled Updates box.

    The time interval between the crawler runs must be more than the maximum crawler execution time. The reason is that a crawler cannot be started if it is running. If a crawler job is started while the crawler is running, this execution is ignored. And the crawler is only started at the next scheduled time, if it is not running already.

    Delete the scheduler

    1. Select the schedule delete from the Scheduled Updates box.

    2. Click Delete.

      The Scheduler prompts us to confirm the deletion.

    3. Confirm to delete the schedule by clicking OK. The Scheduler removes the schedule from the list.

After configuring the scheduler, click the next tab to set more parameters for the content source.


Configure the Filters

Crawler filters control the crawler progress and the type of documents indexed and cataloged. To configure filters, click the Filters tab. Define new filters in the Define Filter Rules box. The defined filters are listed in the Filtering Rules box.

Crawler filters are divided into the following two types:

    URL filters
    They control which documents are crawled and indexed, based on the URL where the documents are found.

    Type filters
    They control which documents are crawled and indexed, based on the document type.

If we define no filters at all, all documents from a content source are fetched and crawled. If we define include filters, only those documents, which pass the include filters are crawled and indexed. If we define exclude filters, they override the include filters, or, if we define no include filters, they limit the number of documents that are crawled and indexed. More specifically, if a document passes one of the include filters, but also passes one of the exclude filters, it is not crawled, indexed, or cataloged.

We can do the following tasks with the Filters box:

    Create a filter
    To add new filter...

    1. Enter the filter name in the entry field Rule name:.

    2. Make the required selection from the following radio button options:

        Apply rule while Collecting documents or Adding documents to index
        Rule type Include or Exclude
        Rule basis URL text or File Type.

    3. This step depends on the selection for the rule basis in the previous step:

      • If we selected URL text as filter body type, enter the URL filter, for example */hr/*.

      • If we selected file type as filter body type, select the required document type from the pull-down list.

      When we use the option Apply rule while Collecting documents with Rule type: Include, verify the URL in the field Collect documents linked from this URL: fits the specified rule; otherwise no documents are collected. For instance, crawling the URL http://www.ibm.com/products with the URL filter */products/* does not give any results because the rule has a trailing slash, but the URL does not. But either crawling http://www.ibm.com/products/ with the URL filter */products/* (both with trailing slash) or crawling http://www.ibm.com/products with the URL filter */products* (no trailing slash) works.

    4. Click the Create icon in the Define Filter Rules box. The new filter appears in the appropriate list of filters. The filters are listed in separate boxes, depending on whether the filter was created as an include or exclude filter, and whether it was defined for crawling or indexing.

    5. Continue adding the filters that we need.

    6. To delete a filter from the list, select that filter, and click Delete.

    After configuring the filters, click the next tab to set more parameters for the content source.

    Delete a filter
    To delete a filter from the list...

    1. Select the filter, which to delete from the list.

    2. Click Delete. You get a prompt to confirm the deletion.

    3. Confirm to delete the filter by clicking OK. The filter is removed from the list.

After configuring the filters, click the next tab to set more parameters for the content source.


Configure security for a content source

We configure the security for indexing secured content sources and repositories that require authentication. To configure the security for a content source, click the Security tab. Manage Search shows two boxes:

In the Define Security Realm box enter the following data entry fields:

  • User Name. Enter the user ID with by which the crawler can access the secured content source or repository.

  • Password. Enter the password for the user ID that you completed under User Name.

  • Host name. Enter the name of the server. For Portal sites and seedlist providers this entry is not required. If we leave it blank, the host name is inferred from the provided root URL.

  • Realm. Enter the realm of the secured content source or repository.

After completing all required data, click the Create icon in the Define Security Realm box. The list in the Security Realms box now shows the security realm, which we configured for the content source.

After configuring security, click another tab to set more parameters for the content source. If we set all required parameters and made all required updates, click Create to create the new content source with the parameters selected.


Complete the creation of a content source

  1. After setting all required parameters and made all required updates, click Create in the Manage Search portlet. This action creates the new content source with the parameters selected. Click Cancel if we do not want to create a new content source and save the updates.

  2. Manage Search takes you back to the main panel. If we clicked Create, it displays the new content source in the content source list. It shows the content source under the name that you gave the content source, or, if specified no name, under its URL.


Edit a content source

To edit a content source...

  1. Click Edit Content Source for the content source to edit. Manage Search opens the Edit Content Source Configuration box. It looks just like the Create a New Content Source box, but shows the configuration data entered when creating the content source.

  2. Update the parameter options as required.

  3. When we complete all the updates, click Save. Manage Search returns to the previous panel. All updates that you made are now enabled.

  4. To return without saving the updates, click Cancel.

If we modify a content source that belongs to a search scope, update the scope manually to verify the scope still covers that content source. Especially if we changed the name of the content source, edit the scope and make sure that it is still listed there. If not, add it again.


Delete a content source

To delete a content source...

  1. Click Delete Content Source for the content source to delete. You get a prompt to confirm the deletion.

  2. Confirm to delete the content source by clicking OK. The content source is removed from the content source list.

Documents that were collected from this content source remains available for search by users under all scopes, which included the content source before it was deleted.


Start to collect documents from a content source

To start an update from a content source manually...

  1. Click Start Crawler for the content source for which to start the update. This action updates the contents of the content source by a new run of the crawler. It fetches the documents from this content source. If they are new or modified, they are updated in the search collection. While a crawl on the content source is running, the icon changes to Stop Crawler. Click this icon to stop the crawl. Portal Search refreshes different content sources as follows:

    • For website content sources, documents that were indexed before and still exist in the content source are updated. Documents that were indexed before, but no longer exist in the content source are retained in the search collection. Documents that are new in the content source are indexed and added to the collection.

    • For WebSphere Portal sites, the crawl adds all pages and portlets of the portal to the content source. It deletes portlets and static pages from the content source that were removed from the portal. The crawl works similarly to the option Regather documents from Content Source.

    • For WCM sites, Portal Search uses an incremental crawling method. Additionally to added and updated content, the Seedlist explicitly specifies deleted content. In contrast, clicking Regather documents from Content Source starts a full crawl; it does not continue from the last session, and it is therefore not incremental.

    • For content sources created with the seedlist provider option, a crawl on a remote system that supports incremental crawling, such as IBM Connections, behaves like a crawl on a WCM site.

  2. To view the updated status information about the progress of the crawl process, click Refresh. The following status information is updated:

      Documents
      Shows how many documents the crawler fetched so far from the selected content source.

      Run time
      Shows how much time the crawler used so far to crawl the content source.

      Status
      Shows whether the crawler for the content source is running or idle.

Update the status information, click the Refresh icon.

To stop a running update of a content source manually...

  1. Locate the content source for which to stop the update from the content sources list. Make sure selected a content source for which the status information shows running.

  2. Click Stop Collecting for that content source. This action stops the crawl.


Verify the address of a content source

Use the option Verify Address to verify the URL address of a selected content source.

Locate the content source, which to verify and click Verify Address for that content source. If the web content source is available and not blocked by a robots.txt file, Manage Search returns the message Content Source is OK. If the content source is invalid, inaccessible, or blocked, Manage Search returns an error message.

When creating a new content source, Manage Search starts the Verify Address feature.


Search Scopes and Custom Links

With Search Scopes we can view and manage search scopes and custom links. The search scopes are displayed to users as search options in the drop-down list of the search box in the banner and in the Search Center portlet. Users can select the scope relevant for their search queries. We configure scopes by one of the following ways:

  • One or more search locations (content sources).

  • Document features or characteristics, such as the document type.

WebSphere Portal includes these scopes:

    All Sources
    This scope includes documents with all features from all content sources in the search by a user.

    Managed Web Content
    This scope restricts the search to sites created by Web Content Management.

We can add our own custom search scopes. We can add an icon to each scope. Users see this icon for the scope in the pull-down selection list of scopes.

We can also add new custom links to search locations. This custom link includes links to external web locations, such as Google or Yahoo. The Search Center global search lists the custom links for users in the selection menu of search options.


Manage Search Scopes and Custom Links

To manage search scopes and custom links, click Search Scopes. Manage Search shows the Search Scopes and Custom Links panel. It lists the search scopes and custom links and related information:

  • For search scopes:

    • The name of the search scope

    • The description of the search scope

    • The status of the search scope, for example, whether it is active and available to users for selection

    • The icons for doing tasks on the scopes.

  • For custom links:

    • The name of the custom link

    • The URL for the custom link

    • The status of the custom link, for example, whether it is active and available to users for their searches

    • The icons for doing tasks on the custom links.

Select the following options or icons and do the following tasks on search scopes and custom links:

  • New Scope.

    Click this option to create a new search scope.

  • Refresh.

    Click this option to refresh the list of search scopes. This action updates the information for the scopes, for example, the status of scopes, or updates that another administrator made on scopes.

  • Move Down and Move Up arrows.

    Click these arrows in the list to move search scopes up and down in the list. This action determines the sequence by which the scopes are listed in the drop-down menu from which users select search options for their searches with the Search Center portlet.

  • Edit Search Scope.

    Click icon to work with a search scope or modify it.

  • Delete Search Scope.

    Click this icon to delete a search scope.

  • New Custom Link.

    Click this option to add new custom link.

  • Edit Custom Link.

    Click this icon to work with a custom link or modify it.

  • Delete Custom Link.

    Click this icon to delete a custom link.

Users must clear their browser cache for the changes to take effect. For example, for a new scope to be available, or for the new default scope to be shown in the correct position.


Create a new search scope

To create a new search scope, click the New Scope button. Manage Search displays the New Search Scope page. Enter the required data in the fields and select from the available options:

    Scope Name:
    Enter a name for the new search scope. The name must be unique within the current portal or virtual portal. This field is mandatory.

    Description:
    If required, enter a description for the search scope.

    Custom Icon URL:
    Enter the URL location where the portal can locate the scope icon to be displayed with the search options for users. If the icon file exists in the default icon directory wps/images/icons, we need to type only the icon file name. If the icon file is in a different directory path, type the absolute file path with the file name. Click Check icon path to ensure the icon is available at the URL specified.

    Status:
    Set the status of the search scope as required. To make the scope available to users, set the status to Active.

    Visible to anonymous users:
    Select Yes to make the search scope available to users who use the portal without logging in. Select No to make the scope available to authenticated users only.

    Query text (optional):
    Enter a query text. This query text is invisibly appended to all searches in this scope. Search by users return results that match both the user search and the query text entered in this field. Both sets of results are weighted with the same relevance in the result list. The query text entered must conform to the syntax rules of entering a query in the Search Center.

    Select Features

    1. Click this button to select document features. Manage Search displays the Add Feature page.

    2. Select the feature(s) as required. These features are applied as additional filters when users select this scope for their search.

    3. When we complete selecting features, click OK to save these features to the new search scope. To return without saving, click Cancel.

    Select Locations

    1. Click this button to select document locations. Manage Search displays the Add Locations page.

    2. Select the location(s) as required. Only documents from these search locations or content sources are searched when users select this scope for their search.

    3. When we complete selecting locations, click OK to save them to the new search scope. To return without saving, click Cancel.

    The location tree also shows content sources that are deleted if they still contain documents in the collection. After a deleted content source has no documents, the cleanup daemon will remove it from the location tree.

To set names and descriptions for the search scope, create and save the scope first. Then, locate the scope on the scopes list, and edit the scope by clicking the Edit ion. The option for setting names and descriptions in other locales is available only on the Edit Search Scope page.

If we modify a content source that belongs to a search scope, update the scope manually to verify the scope still covers that content source. Especially if we changed the name of the content source, edit the scope and make sure that it is still listed there. If not, add it again.


Edit a search scope

To edit a search scope, locate that scope in the list and click the Edit icon for that scope. Manage Search displays the Edit Search Scope page. Update the scope data and select from the available options as required:

    Scope name
    Update the name for the search scope. The name must be unique within the current portal or virtual portal.

    I want to set names and descriptions.
    Click this link to set names and descriptions for other locales.


Delete a search scope

To delete a search scope, locate that scope in the list and click the Delete icon for that scope. When the confirmation prompt appears, confirm by clicking OK, or click Cancel to return without deleting the search scope.


Add a new custom link

We can add Custom Links to allow users to do direct searches to web search engines, such as Google or Yahoo. To add a custom link, click the button New Custom Link. Manage Search displays the New Custom Link page. Enter the required data in the fields and select from the available options:

    Status
    Set the status of the custom link as required. To make the link available to users, set the status to active.

    Custom link name:
    Enter a name for the new custom link. The name must be unique within the current portal or virtual portal. This field is mandatory.

    Link URL:
    Enter the URL to the target web search engine for the new custom link. This field is mandatory. Use the correct format for the URL, as the user queries are appended to the URL. In some cases it might be possible to determine the web interface syntax as follows:

    1. Perform a search with some distinctive search text on the target search engine, for example, an unusual name.

    2. Review the browser URL field and locate your search string. The part of the URL that precedes the search string is likely to be the Link URL for the target search engine.

    3. If the search string is not at the end of the URL, it might be helpful to edit the URL and experiment with different versions with a search string added.

    Examples for web interface syntax are as following:

    • For Google: http://www.google.com/search?&q=

    • For Yahoo: http://search.yahoo.com/search?p=

    Custom icon URL:
    Enter the URL location where the portal can find the icon to be displayed with the new custom link. Click Check icon path to ensure the icon is available at the URL specified.

When we complete the data entry and selected the options as required, click OK to save the new custom link. To return without saving, click Cancel.

To set names and descriptions for the custom link, create and save the link first. Then, locate the custom link on the list, and edit the link by clicking the Edit icon. The option for setting names and descriptions in other locales is available only on the Edit Custom Link page.


Edit a custom link

To edit a custom link, locate that custom link in the list and click the Edit icon. Manage Search displays the Edit Custom Link page. Update the custom link data and select from the available options as required. To set names for other locales, click I want to set names.


Delete a custom link

To delete a custom link, locate that link in the list and click the Delete icon. When the confirmation prompt appears, confirm by clicking OK, or click Cancel to return without deleting the link.


Parent Administration Portlets


See also

  1. Portlets for working with Search
  2. Configure the Search Center portlet
  3. Place the Search Center on a public portal page