Language support for Portal Search

 

+

Search Tips   |   Advanced Search

 

Portal Search supports all national languages that are supported by the portal.

When you create a search collection, you can select the language for which the collection is optimized. The index uses this language to analyze the documents when indexing, if no other language is specified for the document. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword. Portal search uses this language for indexing if there is no language defined for the document.

Portal Search can index content stored in different languages and make it available for search. It uses the unicode setting of the source content to crawl and index content for search. It supplies a choice of tokenizers selectable by administrators: N-gram indexing and linguistic indexing. N-grams are sequences of n consecutive characters in a document. N-grams are generated from a document by sliding a "window" across the text of the document, moving it by one character at a time. N-grams have several advantages over words for use in indexing. First, they are language independent, therefore mixed text can be indexed easily. They are useful for Asian languages in which word tokenization is more difficult, for example Chinese, Japanese, Korean, and Thai. Linguistic indexing is based on a morphological analyzer that reduces terms to their base. It can be usefully applied in most situations when indexing sources with both English and non-English content.


Hints and tips for using Portal Search with different languages

The following hints and tips might be useful if the portal and its users use more than one language:

  1. Set the preferred language of the portal site crawler user ID to match the language of the portal site search collection that it crawls. If you do this after you started a crawl on the portal site search collection, reset the portal site collection.

  2. If the portal site is multilingual and your users use different languages to search the portal.

  3. If your users use external search services to perform searches in different languages


Language support by the Portal Search summarizer

The Portal Search summarizer produces summaries for all languages that are supported by the portal. For some languages the summarizer has access to a stemmer program. It uses stems as the base forms for words, as opposed to the lemma forms used by summarizers which have dictionaries. Summaries for these languages can have better quality. Currently the stemmer program is available for the following languages:

  • Danish

  • Dutch

  • English

  • French

  • German

  • Italian

  • Norwegian

  • Portuguese

  • Russian

  • Spanish

  • Swedish


Parent topic:

Portal Search - new key features and capabilities


Related reference


Summarizer