Crawling data for the first time 

The first time that you crawl the IBM Connections data, crawl all content, so it takes longer to perform this operation than it takes to perform subsequent crawls.


Procedure

To crawl the data for the first time...

  1. Send a GET request to the seedlist feed for the application whose data you want to crawl. Do not specify any parameters on the request.

      Activities

      Blogs

        http://<servername>/blogs/seedlist/myserver

        The blogs seedlist also contains content from community blogs.

      Bookmarks

        http://<servername>/dogear/seedlist/myserver

      Communities

        http://<servername>/communities/seedlist/myserver

      Files

        http://<servername>/files/seedlist/myserver 

      Forums

        http://<servername>/forums/seedlist/myserver

      Profiles

        http://<servername>/profiles/seedlist/myserver

      Wikis

        http://<servername>/wikis/seedlist/myserver

        The wikis seedlist also contains content from community wikis.

      For example:

      https://enterprise.example.com/files/seedlist/myserver

  2. Process the returned feed. Find the rel=next link and send a GET request to the web address specified by its href attribute.

  3. Repeat the previous two steps until the response includes a <wplc:timestamp> element in its body.

  4. Store the value of the <wplc:timestamp> element; pass that value as a parameter when you perform a subsequent crawl of the data.


Parent topic

Crawling data

Related reference
Seedlist response


   

 

});

+

Search Tips   |   Advanced Search