Setting up a URL object for the Webserver search engine on HTTP Server
In the IBM HTTP Server for i5/OS, you can set up a URL object file for use with the Webserver search engine with the Web Administration for i5/OS interface.
Information for this topic supports the latest PTF levels for HTTP Server for i5/OS . IBM recommends that you install the latest PTFs to upgrade to the latest level of the HTTP Server for i5/OS. Some of the topics documented here are not available prior to this update. See IBM Service for more information.
A URL object contains a list of URLs plus additional web crawling attributes. If you select to edit an existing URL object, the contents of the current object are displayed. The URL object can be selected together with an options object to use when you select to build document lists by crawling remote web sites. See Setting up a document list for the Webserver search engine on HTTP Server for more information.
To create a URL object, do the following:
- Start the Web Administration for i5/OS interface.
- Click the Advanced tab.
- Click the Search Setup subtab.
- Expand Search Engine Setup.
- Click Build URL object.
- Choose URL object options:
- Create a new URL object
- Select this option to create a new URL object. Enter the name of the new URL object.
- Edit this URL object
- Select this option to edit an existing URL object. Select the URL object from the list.
- Click Apply.
- Enter document storage and language options:
- Directory to store documents
- Enter the directory where documents found on web sites are stored. Possible values include any valid directory path name.
- Document language
- Select the language of the documents that are downloaded. The list provides all valid language entries.
- Enter URL list options:
- Action
- Click Add to add a new row.
- URL
- Enter a URL in the form, for example, http://www.ibm.com. If you enter a URL that requires authentication, create a validation list using the Build validation list form. See Setting up validation lists for the Webserver search engine on HTTP Server for more information.
- URL domain filter
- Enter a domain to limit crawling, for example, ibm.com.
- Maximum crawling depth
- Enter the depth of links from the starting URL to continue crawling. The starting URL is at depth 0. The links on that page are at depth 1.
- Support robot exclusion
- Choose whether to support robot exclusion. If you select Yes, any site or pages that contain robot exclusion META tags or files will not be downloaded.
- Click Apply.
Your new URL object can now be used when Web crawling remote sites.
Parent topic:
Search tasks