Set up a JCR search collection

Set up a JCR search collection

A JCR search collection is a special purpose search collection used by WebSphere Portal applications. It is not designed to be used alongside user-defined search collections. Setup includes the creation of a new content source for the search collection.
The portal installation has the JCR search collection created by default. It is named JCRCollection1. If this collection is removed or does not exist for other reasons, we can manually re-create the JCR search collection. Web Content Manager Authoring, and its search capability, are required to have the JCR search collection available, paired with the respective content source. If JCRCollection1 is deleted, a search is not possible using the Authoring portlet.
The JCR search collection can be used only by a search portlet that knows how to present and deal with the search result, which is useless in a more generic context of search. The JCR search collection is flagged so that it does not participate in search using the All Sources search scope. An administrator cannot manually add it. The JCR search collection is a special purpose search collection the JCR requires to allow specialized application to do low-level searches in the repository. The JCR search collection is required to be available only once.

For Web Content Manager:

The JCRCollection1 collection is used by the search feature within the WCM authoring portlet. If we delete this search collection, we might not be able to search for items within the authoring portlet.
JCRCollection1 is created the first time created a web content item, if it does not exist. In this case, it might not be necessary to create the collection manually, although it is fine to create it manually first, if required.

For virtual portals:

When we create a virtual portal, the creation of the JCR search collection depends on whether we create the virtual portal with or without content:

If we create the virtual portal with content, the portal creates the JCR collection for the virtual portal by default.

If we create only the virtual portal and add no content to it, the portal creates no JCR collection with it. It gets created only when content is added to the virtual portal.

We can view the URL of the JCR search collection in the search administration portlet Manage Search of the virtual portal...
http://host:port/wps/seedlist/server/hello?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=wsid@ootb_crawlerwsid

Where wsid is the actual workspace ID of the virtual portal. The workspace ID is the identifier of the workspace in which the content item is created, stored, and maintained. For example, if the workspace ID of the virtual portal is 10, then the URL looks as follows...
http://host:port/wps/seedlist/server/hello?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=10@ootb_crawler10

If the JCR search collection was deleted, or if you added content to an originally empty virtual portal and the JCR search collection was not automatically created...

For a virtual portal, go to the Security tab of the content source to verify the workspace ID of the virtual portal is correct.

Recreate the JCR search collection...
ConfigEngine create-textsearch-collections

If neither of the preceding options succeed in creating the JCR search collection, manually set up the JCR search collection.

Set up a JCR search collection manually

Click...
Administration | Search Administration | Manage Search | Search collections | New collection

Specify the following values for the parameters as required:

Search Service

Select the required search service the JCR collection uses. If we have a stand-alone environment, select Default Portal Service. If we have a clustered environment, select Remote Search Service.

Location of collection

The directory location for the collection where you intend the search collection to be created. This parameter is to be specified as index directory location/collection name. For example, if the index directory is c:/JCR and the collection name is JCRCollection1, then the location of the collection must be specified as c:/JCR/JCRCollection1.
Verify the jcr.textsearch.indexdirectory resource value is updated with c:/JCR. To view this resource and corresponding value, complete the following steps:

Go to...
Resources | Resource Environment | Resource Environment Providers | JCR ConfigService PortalContent | Additional Properties | Custom properties.

Find jcr.textsearch.indexdirectory and update the value if needed.

Name of collection.

The name of the collection must be JCRCollection1.

Description of collection

Optional. Specify JCR seedlist collection.

Specify Collection language

Collection language. By default this parameter is set to English (United States).

After creating the new collection, we can see the name of the collection created in the list.

Double-click the collection created.

To create the content source for the new search collection, click New Content Source.

Collection parameters...

For the type of the content source, select Seedlist Provider.

Provide the name for the new Content Source in the field Content Source Name. For example, we can specify JCRSource.

Specify the value for the URL field Collect documents linked from this URL...
http://server_name:port/wps/seedlist/server?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=1@OOTB_CRAWLERwsid

In this URL the range parameter specifies the number of documents in one page of a session.
For a virtual portal, specify the content source URL for the virtual portal as follows...
http://server_name:port/wps/seedlist/server/virtual_portal_context?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=1@OOTB_CRAWLERwsid

Where wsid is the workspace ID of the virtual portal.
To determine the workspace ID of the virtual portal...

Go to...
Administration | Portal Analysis | Enable Tracing

In the Append these trace settings: field, add...
com.ibm.icm.ts.*=finest

Save all Web Content Manager documents in the virtual portal.

In trace.log, we can find trace information similar to the following:

[6/5/13 18:51:04:337 IDT] 000001c3 BaseDBImpl 3 insertSeedlistEvents: Inserted event: Event: action='Update_Node(3)', timestamp='2013-06-05 18:51:04.337', document id=,<workspace: 3, itemid:AB001001N13F05B8320005B295>', parentID:<workspace: 3, itemid: >', wsid: 3

Click Create to create the new content source. If the Content Source was created successfully, the following message is displayed on the page:
EJPJB0025I: Content source source_name in collection collection_name is OK.

We can start the crawler manually or schedule it to run at regular intervals.

To start the crawler manually, go to the content source and click the Start Crawler button for the content source.

To schedule the seedlist crawler, click the Edit Content Source button, and click the Scheduler tab. Specify the date and time and the frequency for the crawl. The crawler is triggered automatically at the time that you scheduled.

See also: Re-creating a JCR search collection after it was deleted

Parent Set up search collections