The indexing process
The Search index is generated by retrieving information from each of the applications based on a schedule defined by the administrator. Search uses the IBM WAS scheduling service for creating and updating the Search index.Starting in IBM Connections 3, the Search index is required to be deployed on each node running the Search enterprise application.
Search indexing happens in three distinct stages:
Crawling
Crawling is the process of accessing and reading content from each application in order to create entries for indexing.
During the crawling process, the Search application requests a seedlist from each IBM Connections application. This seedlist is generated when each application runs queries on the data stored in its database, based on the parameters that the Search application submits in its HTTP request.After each seedlist is generated, the seedlist entries are read as part of the crawling process.
Indexing
During the indexing phase, the seedlist entries are placed into a database table, which acts as an index cache.
When the indexing phase is complete, the seedlists are removed from memory. A resume token marks where the last seedlist request finished so that the Search application can start from this point on the next seedlist request. This resume token enables Search to retrieve only new data that was added after the last seedlists were generated and crawled.
Index building
The index builder takes entries from the database cache and stores them in an index on the local file system. Each node has its own index builder, so crawling and preparing entries only takes place once in a network deployment, and then the index is created on each node from the information that has already been processed. After the index has been built from the database cache, post-processing work takes place on the new index entries to add additional metadata to the search results, such as adding the content of files to Files search results and linking bookmark information to URLs.
The crawling and indexing stages take place concurrently. For example, if an indexing task that indexes files, activities, and blogs is created, each of these applications is crawled and added to the database cache at the same time. After the crawling and indexing stages are complete, all the nodes are notified that they can build their index. At this point, the index builder on each node begins extracting entries from the database cache and storing them in the index on the local file system.
Indexing steps
The indexing process involves the following steps:
- The WAS scheduler schedules an indexing task.
- The Search indexing process starts.
- The indexing process creates a thread for each application in the deployment.
- Each thread reads the seedlist for the corresponding application and creates entries for indexing.
- The indexer for each application adds the entries to the database cache.
- When all the threads are finished, each node in the deployment is notified that it can build its index.
Parent topic
Administer SearchRelated concepts
Scheduling tasks
Validating Search seedlistsRelated reference
Seedlist response
});