Hints and tips for using Portal Search

View some useful tips for using Portal Search.


Content Model has only one search collection

At this time the Content Model Search Service has only one search collection. This search collection is provided with the installation by default. You cannot modify this default Content Model search collection or create additional search collections under the Content Model Search Service. The content model search service is listed because you can include it in scopes.


Users cannot see portal site search results in their preferred language

If the preferred language of the crawler user ID does not match the language of the search collection, users might not see search results in their language. Therefore, set the preferred language of the portal site crawler user ID to match the language of the portal site search collection that it crawls. If you do this after you started a crawl on the portal site search collection, you need to reset the portal site collection. Refer to Create or resetting the portal site collection.

If the portal site is multilingual and users use different languages to search WebSphere Portal, set the portal site collections up as described under Crawl a multilingual portal site.


Use the Search Center with external search services with different languages

In order to use external search services such as Google and Yahoo! with an English search keyword, a URL such as the sample URL mentioned in the Search Center portlet help for configuring the portlet works fine as is: http://www.google.com/search?q= . However, if you search in other languages, consult the documentation of the remote search service that you use to ensure that the Web interface is set up and used appropriately for the language that you use for search. This can avoid problems with the displayed results, depending on the combination of languages set for WebSphere Portal, browser, and the search.


How Portal Search handles special characters when indexing

Portal Search indexes words that are composed of consecutive literals, that is letters, digits, and special characters. This includes the following characters:

During indexing special characters are handled as follows:
Blank or white space; this includes the tab


Line break or new line


Dot or sentence end period ( . ) and comma ( , )


Question mark ( ? ) and exclamation mark ( ! )


Other punctuation: ( ) { } [ ] < > ; : / / | " _ -


Other characters


Notes:

  1. All characters that split words are discarded during indexing and searching.

  2. The statements made above apply to indexing. However, in a search query all characters that can be part of the search syntax are treated in that capacity and not as part of the search query. These are the plus ( + ) and minus ( - ) signs, double quotation marks ( " ), and the asterisk wild card character ( * ). If users want to include such characters in their search query, they must enclose them in double quotation marks. For example "+hello" searches for the string +hello; "*Hello*" searches for the string *Hello*.

  3. The less than ( < ) and greater than ( > ) symbols are special HTML characters that Search cannot handle.


Time required for crawls and imports and availability of documents

The following search administration tasks can require extended periods of time:

These tasks are put in a queue. It might therefore take several minutes until they are executed and the respective time counters start, for example, the crawl Run time and the timeout for the crawl set by the option Stop collecting after (minutes): . The time required for these tasks is further influenced by the following factors:

Therefore both the time limits that you can specify and the times that are shown for these processes work as approximate time limits. This applies, for example, to the following scenarios:

Furthermore, this influences other status indicators given in the Manage Search portlet. For example, the number of documents shown for a content source can show with an unexpectedly low figure or even at zero ( 0 ) until the crawl on that content source has been completed.


Memory required for crawls

Depending on Portal Search environment, crawling can require large amounts of memory. Therefore, before you start a crawl, verify WebSphere Portal has enough free memory. Memory shortage can cause a corrupted search collection and eventually lead to a system freeze.

To resolve this problem, raise the limit to the number of open files by using the ulimit command as root administrator.

Due to the resources needed for a crawl and index, it is recommended that you schedule crawls to occur when user activity is relatively low.


Crawl a portal site for the first time can result in a message

When you start the crawl on a portal site for the first time, this can result in the following message:

     EJPJP0009E: Wrong root url for Portal site crawler: https://root_url

You can ignore this message. The crawl runs correctly.

To resolve this problem, edit the content source, select the General Parameters tab, and the set the parameter Stop fetching documents after (seconds): to a value of 90 seconds.


Uninstall WebSphere Portal does not delete search collections

When you uninstall WebSphere Portal, the directories and files for the search collections are not deleted. Therefore, before you uninstall WebSphere Portal, delete all search collections by selecting the collections individually and clicking the option Delete Collection. If you do not do this, these files and directories remain on the hard drive. To delete the search collection data after uninstalling WebSphere Portal, you need to do this manually. The directory path of a search collection is determined by what you typed in the field Location of Collection when you created the search collection. You can look up the collection location by performing the following steps:

  1. Select the page Administration.

  2. Select the Search Administration portlet.

  3. From the Search Collections box select the collection that you want to configure for local search service.

  4. The collection location is shown in the Search Collections box under Collection Status > Collection location. If the Collection Status is collapsed, expand it by clicking the plus sign (+ ).


HTTP crawler does not support JavaScript

The HTTP crawler of the Portal Search Service does not support JavaScript. Therefore some text of Web documents might not be accessible for search by users. This depends on how the text is prepared for presentation in the browser. Specifically text that is generated by JavaScript might or might not be available for search.


UNIX OSs might require higher limit of open files for Portal Search to work properly

The limit for the number of open files in a UNIX™ OS might be too low for Portal Search to work properly. This might result in a Portlet Unavailable error. To resolve this problem and allow a higher number of files to be handled, raise the limit to the number of open files by issuing the following command as root administrator:

     ulimit -n 4096


Create the portal site search collection fails


Problem: If the file path length for the location of search collections exceeds its limit, the collection cannot be created. This can occur particularly when the portal site collection is created under UNIX OSs.
Cause: The file path length for the portal search collection is limited to 118 characters. If this limit is exceeded, the default collection cannot be created. The following items contribute to the length of the file path:


Solution: Proceed as follows:

  1. Change the default directory location for the portal site search collection to a shorter path, so that the complete path and file name does not exceed a length of 118 characters. For details about how to do this refer to Configure the default location for search collections.

  2. Recreate the portal site search collection. For details about how to do this refer to Create or resetting the portal site collection.


Increase JVM heap size when using categorizer

If you use a predefined categorizer with Portal Search, increase the JVM heap size to at least 1024 MB. To do this...

  1. Start server1 and log in.

  2. Navigate to Servers, > Application Servers > WebSphere_Portal > Process Definition > Java Virtual Machine.

  3. Determine the configured maximum heap size; for example, this might be 512 MB.

  4. Increase the maximum heap size to at least 1024 MB.

    • When more memory is allocated than the physical memory in the system, paging will occur, and this can result in very poor performance.

    • On i, the JVM max heap size is set to 0 by default, indicating that there is no maximum. This setting should not be changed on i.

  5. Restart WebSphere Portal.


Search collection is unavailable for Search and Browse portlet


Problem: A Search and Browse portlet cannot access the search collection to which you configured it.
Cause: If you migrated from a previous version of WebSphere Portal, the parameter for specifying the target search collection has been changed in the configuration for the Search and Browse portlet. The parameter IndexName has been replaced by CollectionLocation.
Solution: If you migrate from previous versions and have the Search and Browse portlet deployed, transfer the value from the old to the new parameter manually. For details about this refer to Migrate the Search and Browse portlet.


Search collections unavailable in cluster if failover occurs


Problem: If a cluster member in a cluster fails, users who were using the affected cluster member when the failover occurred can no longer access search collections. This can occur with horizontal scaling when a node fails or with vertical scaling when a particular cluster member fails.
Solution: Users who are logged into the cluster member that failed must log out of WebSphere Portal and then log back in before they will be able to access search collections again.


Search can return documents based on metadata

Search can return documents based on metadata of these documents, not just on words found in the fields or actual text of the document. It might appear to Portal Search users that their searches return documents which do not appear to match the search criteria.
Cause: Metadata for documents is also indexed for search. Therefore if the metadata of documents matched the search criteria, these documents are also returned as results for the search.
Solution: This works as designed and is usually considered to be of benefit.


Documents from deleted content source can remain available under scope

If you delete a content source, then the documents that were collected from this content source will remain available for search by users under all scopes which included the content source before it was deleted.
Cause: These documents will be available until their expiration time ends.
Solution: The expiration time can be specified under Links expire after (days): under General Parameters when you created the content source.


Portal Search portlets are not compatible with WSRP

The Portal Search portlets cannot be provided as WSRP services, as some additional and more advanced WebSphere Portal concepts and features are not reflected by the current WSRP standard yet. This includes the Portal Search portlets Manage Search, Taxonomy Manager, Search and Browse, and the Search Center portlets.


Default Portal Search Service and its collections show in the portal default language

The search administration portlet Manage Search lists the Default Portal Search Service and its collection Portal Content or other collections in the default portal language and not in the language that the user has selected as preferred language for the portal or set in the browser. For example, if the portal default language is set to English and the user has selected German as the preferred portal language or has set the browser language to German, the Default Portal Search Service and its collections show in English.


On i set USER.REGION variable


For IBMi only: Portal search collections might fail to collect documents. In this case the logs will provide the following or similar information:

 [8/24/08 23:19:47:164 EDT] 000000cd ServletWrappe E   
         Uncaught init() exception thrown by servlet SearchSeedlistServletSecured
 [8/24/08 23:19:47:175 EDT] 000000cd ServletWrappe E   
         Deregister the mbean because of uncaught init() exception thrown by 
         servlet SearchSeedlistServletSecured: javax.servlet.ServletException: 
         Could not load resource bundle nls.SeedlistServletMessages using locale 
         en_${USER.REGION} - Java Exception Message: java.util.MissingResourceException: 
         Can't find bundle for base name nls.SeedlistServletMessages, 
         locale en_${USER.REGION}

Solution: In order for portal collections to work on a i system, set the system variable USER.REGION.


Virtual portals have separate search services and collections

Search services and search collections are separate for individual virtual portals and are not shared between individual virtual portals. Set up separate search services and separate search collections for each individual virtual portal. These collections can be used to crawl and search the same set of documents.


Configure wild card search for the Search and Browse portlet

When searching by using the Search an Browse portlet, users can use an asterisk as a trailing wild card in their search string. This wild card search is to some extent limited, as by default the portlet looks for only 20 terms that match the base search string before it executes the search query against the search index. You can configure this number of wildcard completion matches. To do this...

  1. Edit the file icm.properties. It is located at PortalServer/jcr/prereq.jcr/lib/com/ibm/icm .

  2. Uncomment the line #jcr.textsearch.wildcardTermExpansionSize = 20by removing the hash sign ( # ).

  3. Modify the number 20 to a larger number as required.

  4. Save changes.

  5. Restart the portal server.

Raising the number of completion matches might impact portal search performance.


Parent

Portal Search


Related tasks


Import a web collection
Create or resetting the portal site collection
Crawl a multilingual portal site
Configure the default location for search collections
Migrate the Search and Browse portlet

 


+

Search Tips   |   Advanced Search