Create and configure search collections
This topic gives an overview of how you manage search collections and their content sources.
To administer search collections, click...
Administration | Search Administration | Manage Search | Search CollectionsThis panel includes creating, updating, and removing search collections, and other administrative tasks that refer to search collections.
When you select Search Collections, Manage Search displays the Search Collections panel. It lists the search collections in the portal and related information, and it allows you to select options and perform tasks on the search collections and their content sources.
The selectable options that are displayed and available for collections and content sources depend on their type and setup.
In the Search Collections panel you can select the following option icons and perform the following tasks:
- Change the search collection with which you want to work. To do this, select another search collection from the pull-down list.
- New collection. Select this option to create a new search collection.
- You cannot create additional search collections for the default Content Model search service.
- When you specify the directory location for the collection, be aware that creating the collection can overwrite files in that.
- Refresh the list of collections.
- Locate a collection and perform one of the following tasks by clicking the appropriate icon for that collection:
- Search and Browse Collection.
Use this option to work with the documents of the selected collection. You can perform the following administrative tasks:
- Browse the documents of the selected collection.
- View the individual documents of the selected collection.
- Search the documents of the selected collection.
- Edit the fields of the documents in the selected collection.
- Delete documents from the selected collection.
The panel design of the Browse Documents page is similar to that of the Search and Browse portlet that users use to search documents.
- Import or Export Collection
Use this option to import or export the selected search collection. Portal Search provides a Portal Search XML interface for this feature. The export and import operations can be of benefit when you upgrade to software levels which are not necessarily compatible with the data storage format of older versions of the software. To prevent loss of data, you export all data of search collections to XML files before upgrading the software. Then after upgrading the software level, use the previously exported files to return the search collection data back into the new software level. For details about how to do this refer to Migrating Web search collections.
- Before you export a collection, make sure that the portal application process has write access to the target directory location. Otherwise you might get an error message, such as File not found.
- You can import collection data only into an empty collection. You cannot import collection data into a target collection that has content sources or documents already.
- When you import collection data into a collection, all collection settings are overwritten by possibly imported settings. For example, the language setting is overwritten.
- When you import a collection, a background process fetches, crawls, and indexes all documents that are listed by URL in the previously exported file. Therefore be aware of the Memory required for crawls and the Time required for crawls and imports and availability of documents.
- Refresh Collection Data.
Use this option to manually refresh the selected search collection. The index performs a complete re-crawl on all the content sources of the search collection.
- Add Document.
Use this option to manually add a new document to a collection. You can specify the new document either as a file by a file location or as a Web document by a URL. Depending on whether you selected File or URL, update the document location in the panel for editing the document content information:
- For content specified by file location, the field Edit Document Information for URL - Update machine name and driver for this URL has a partial file location filled in, based on the file location that you entered as follows:
file://[machine name]/your_path/your_file_nameUpdate the contents of the field to a valid file location by which users can access the document. To do this, replace the string [machine name] by the name of the machine on which the document resides.
For security reasons some browsers prevent access to the file system. If your environment requires searching files, you find information about how to configure the browser for accessing the file system in the Internet.
- For content specified by URL, the field Edit Document Information for URL - Update machine name and driver for this URL has a document URL filled in, based on the URL that you entered. Update this URL as necessary to a valid URL by which users can access the document.
The document that you add must be accessible to the crawler and to the users who will search the document. For example, a document specified by file location must be available in a public share, if you want anonymous users to be able to search it.
- Pending Documents.
The documents returned by a crawl of the selected search collection are sent to the Pending Documents box if you disable the option for adding them to the collection automatically. Use the Pending Documents panel to accept or reject these documents. By accepting documents you make them available for search by users. When you accept a document, you can also edit its metadata.
You disable the option Add all documents to collection automatically for a content source in the Manage Search portlet. If you do this, documents that result from a crawl are moved to the Pending Documents box.
The Pending Documents icon appears for a collection only if there are pending documents from a content source of that collection available.
- Category Tree
If you are using a rule-based taxonomy for the selected search collection, use this option to manage that taxonomy, that is to work with categories and filter rules. For more information about this refer to User-defined rule-based categorizer and to the portlet help.
The Category Tree icon appears for a collection only if a user-defined categorizer has been defined for that collection.
- Delete Collection.
Use this option to delete the selected search collection.
- Select a collection by clicking the collection name link. Portal Search displays the Content Sources and the Status of the selected collection. You can select the following option icons and perform the following tasks:
- New Content Source
Use this option to create a new content source for this collection. You can create more than one content source for a search collection.
- Refresh the list of content sources and the status shown for this collection.
- Work with the content sources of the collection
- View the Collection Status information of the selected search collection.
The status fields show the following data that changes over the lifetime of the search collection:
- Search Collection Name:
- Shows the name of the selected search collection.
- Search Collection Location:
- Shows the location of the selected search collection in the file system. This is the full path where all data and related information of the search collection is stored.
- Collection Description:
- Shows the description of the selected search collection if available.
- Search Collection Language:
- Shows the language for which the search collection and its index are optimized. The index uses this language to analyze the documents when indexing, if no other language is specified for the document. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword. For more information refer to Language support for Portal Search.
- Categorizer used:
- Shows the categorizer that is used by the search collection. For more information about categorizers refer to Categorizers and taxonomies and the related subtopics. For more information about how to work with a rule-based categorizer for a search collection, refer to User-defined rule-based categorizer and to the Manage Search portlet help.
- Summarizer used:
- Shows whether a static summarizer is enabled for this search collection. For information about the summarizer refer to Summarizer.
- Remove common words from queries:
- Shows whether the indexer and the search filter out common words from documents, such as and, the, of.
- Last update completed:
- Shows the date when a content source defined for the search collection was last updated by a scheduled update.
- Next update scheduled:
- Shows the date when the next update of a content source defined for the search collection is scheduled.
- Number of active documents:
- Shows the number of active documents in the search collection, that is, all documents that are available for search by users.
Notes:
- To update the status information, click Refresh. Clicking the refresh button of the browser will not update the status information.
- If you delete a portlet from the portal after a crawl of the portal site, the deleted portlet is no longer listed in the search results. However, refreshing the view does not update the status information about the Number of active documents. This information is not updated until after the next cleanup run of portal resources.
Parent topic
Set up search collections
Related concepts
Language support for Portal Search
Categorizers and taxonomies
Delayed cleanup of deleted portal pages
Related tasks
Manage the content sources of a search collection
Exporting and importing search collections
Related reference
Hints and tips for using Portal Search
User-defined rule-based categorizer
Summarizer