Set up a JCR search collection

Set up a JCR search collection

The JCR search collection JCRCollection1 is a default special purpose search collection, created by default. If JCRCollection1 is deleted, a search is not possible using the WCM authoring portlet. If required, we can manually re-create.
The JCR search collection is flagged so that it does not participate in search using the All Sources search scope. An administrator cannot manually add it. The JCR search collection is a special purpose search collection the JCR requires to allow specialized application to do low-level searches in the repository. The JCR search collection is required to be available only once. The JCR search collection can be used only by a search portlet that knows how to present and deal with the search result, which is useless in a more generic context of search.

Virtual Portals

When we create a virtual portal, if we create the virtual portal with content, the portal creates the JCR collection for the virtual portal by default. If we create only the virtual portal and add no content to it, the portal creates no JCR collection with it. It gets created only when content is added to the virtual portal.
The URL of the JCR search collection can be viewed using the Manage Search portlet:
http://host:port/wps/seedlist/server/hello?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=wsid@ootb_crawlerwsid

Where wsid is the workspace ID of the virtual portal. The workspace ID is the identifier of the workspace in which the content item is created, stored, and maintained. For example, if the workspace ID of the virtual portal is 10, then the URL looks as follows...
http://host:port/wps/seedlist/server/hello?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=10@ootb_crawler10

Set up a JCR search collection using ConfigEngine

If the JCR search collection was deleted, or if we added content to an originally empty virtual portal and the JCR search collection was not automatically created...

For a virtual portal, go to the Security tab of the content source to verify the workspace ID of the virtual portal is correct.

Recreate the JCR search collection...
ConfigEngine create-textsearch-collections

Set up a JCR search collection manually

Click...
Administration | Search Administration | Manage Search | Search collections | New collection

Set values for:

Search Service

For a stand-alone environment, select Default Portal Service.
For a clustered environment, select Remote Search Service.

Location of collection

/path/to/JCR/JCRCollection1
Verify the jcr.textsearch.indexdirectory resource value is updated

Go to...
Resources | Resource Environment | Resource Environment Providers | JCR ConfigService PortalContent | Additional Properties | Custom properties.

Find jcr.textsearch.indexdirectory and update the value if needed.

Name of collection.

Must be JCRCollection1.

Description of collection

Optional. Specify JCR seedlist collection.

Specify Collection language

Default is English (United States).

After creating the new collection, we can see the name of the collection created in the list.

Double-click the collection created.

Click New Content Source.

Collection parameters...

Content source: Seedlist Provider

Content Source Name: For example, JCRSource.

Collect documents linked from this URL:
http://server_name:port/wps/seedlist/server?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=1@OOTB_CRAWLER10

The range parameter specifies the number of documents in one page of a session.
For a virtual portal, specify the content source URL for the virtual portal as follows...
http://server_name:port/wps/seedlist/server/virtual_portal_context?Action=GetDocuments&Format=ATOM&Locale=en_US&Range=100&Source=com.ibm.lotus.search.plugins.seedlist.retriever.jcr.JCRRetrieverFactory&Start=0&SeedlistId=1@OOTB_CRAWLER10

Where "10" is workspace ID (wsid) of the virtual portal.

Click Create to create the new content source.
If successful, the following message is displayed:
EJPJB0025I: Content source source_name in collection collection_name is OK.

We can start the crawler manually or schedule it to run at regular intervals.

To start the crawler manually, go to the content source and click the Start Crawler button for the content source.

To schedule the seedlist crawler, click the Edit Content Source button, and click the Scheduler tab. Specify the date and time and the frequency for the crawl. The crawler is triggered automatically at the time that you scheduled.

Workspace ID (wsid)

To determine the workspace ID of the virtual portal...

Go to...
Administration | Portal Analysis | Enable Tracing

In the Append these trace settings: field, add...
com.ibm.icm.ts.*=finest

Save all Web Content Manager documents in the virtual portal.

In trace.log, we can find trace information similar to the following:

[6/5/13 18:51:04:337 IDT] 000001c3 BaseDBImpl 3 insertSeedlistEvents: Inserted event: Event: action='Update_Node(3)', timestamp='2013-06-05 18:51:04.337', document id=,<workspace: 3, itemid:AB001001N13F05B8320005B295>', parentID:<workspace: 3, itemid: >', wsid: 3

See also: Re-creating a JCR search collection after it was deleted

Parent Set up search collections