Subsequently crawling data 

After crawling the data for the first time, perform a crawl on a regular basis to stay up-to-date with the changes being made by the people using IBM Connections.


Before starting

You must have performed an initial data crawl and saved the value of the <wplc:timestamp> element before you can perform this procedure. See Crawling data for the first time for more details.


About this task

This procedure collects all content that was created, updated, or deleted up to the server time at which you started the crawl. Because content changes on a constant basis, the list of entries in seedlist responses are likely different for two crawls using the same Timestamp parameter value but started at different times.


Procedure

To subsequently crawl data...

  1. Send a GET request to the seedlist feed for the application whose data you want to crawl. Include the following parameter on the request:

      Range

        Optional. Specifies the number of entries to return in the seedlist. Use this parameter to limit or increase the number of entries returned in a seedlist response. The default range is 500 entries. Setting the value of this parameter too large can potentially cause excessive load on IBM Connections applications.

      Timestamp

        Required. The string value of the <wplc:timestamp> element in the body of the last response returned by the previous seedlist crawling session. This value cannot be manually composed.

      These parameter values are case-sensitive. All other parameters used by the seedlist SPIs are considered internal; do not set them manually.

      The following list contains the seedlist URLs for each application:

      Activities

      Blogs

        http://<servername>/blogs/seedlist/myserver

        The blogs seedlist also contains content from community blogs.

      Bookmarks

        http://<servername>/dogear/seedlist/myserver

      Communities

        http://<servername>/communities/seedlist/myserver

      Files

        http://<servername>/files/seedlist/myserver 

      Forums

        http://<servername>/forums/seedlist/myserver

      Profiles

        http://<servername>/profiles/seedlist/myserver

      Wikis

        http://<servername>/wikis/seedlist/myserver

        The wikis seedlist also contains content from community wikis.

      For example:

      https://enterprise.example.com/files/seedlist/myserver?Timestamp=AAABJRVgyWw%3D

  2. Process the returned feed. Find the rel=next link and send a GET request to the web address specified by its href attribute.

  3. Repeat the previous two steps until the response includes a <wplc:timestamp> element in its body.

  4. Store the value of the <wplc:timestamp> element; pass that value as a parameter when you perform a subsequent crawl of the data.


Parent topic

Crawling data

Related reference
Seedlist response


   

 

});

+

Search Tips   |   Advanced Search