Subsequently crawling data

Subsequently crawling data
After crawling the data for the first time, perform a crawl on a regular basis to stay up-to-date with the changes being made by the people using IBM Connections.

Before starting
You must have performed an initial data crawl and saved the value of the <wplc:timestamp> element before you can perform this procedure. See Crawling data for the first time for more details.

About this task
This procedure collects all content that was created, updated, or deleted up to the server time at which you started the crawl. Because content changes on a constant basis, the list of entries in seedlist responses are likely different for two crawls using the same Timestamp parameter value but started at different times.

Procedure
To subsequently crawl data...
Send a GET request to the seedlist feed for the application whose data you want to crawl. Include the following parameter on the request:
Range
Optional. Specifies the number of entries to return in the seedlist. Use this parameter to limit or increase the number of entries returned in a seedlist response. The default range is 500 entries. Setting the value of this parameter too large can potentially cause excessive load on IBM Connections applications.

Timestamp
Required. The string value of the <wplc:timestamp> element in the body of the last response returned by the previous seedlist crawling session. This value cannot be manually composed.

These parameter values are case-sensitive. All other parameters used by the seedlist SPIs are considered internal; do not set them manually.
The following list contains the seedlist URLs for each application:
Activities
http://<servername>/activities/seedlist/myserver
The activities seedlist also contains content from community activities.
Blogs
http://<servername>/blogs/seedlist/myserver
The blogs seedlist also contains content from community blogs.
Bookmarks
http://<servername>/dogear/seedlist/myserver
Communities
http://<servername>/communities/seedlist/myserver
Files
http://<servername>/files/seedlist/myserver 
Forums
http://<servername>/forums/seedlist/myserver
Profiles
http://<servername>/profiles/seedlist/myserver
Wikis
http://<servername>/wikis/seedlist/myserver
The wikis seedlist also contains content from community wikis.
For example:
https://enterprise.example.com/files/seedlist/myserver?Timestamp=AAABJRVgyWw%3D
Process the returned feed. Find the rel=next link and send a GET request to the web address specified by its href attribute.
Repeat the previous two steps until the response includes a <wplc:timestamp> element in its body.
Store the value of the <wplc:timestamp> element; pass that value as a parameter when you perform a subsequent crawl of the data.
Parent topic
Crawling data
Related reference
Seedlist response

});

+
Search Tips | Advanced Search