Language support for Portal Search

 

+

Search Tips   |   Advanced Search

 

Portal Search supports all national languages that are supported by the portal.

When you create a search collection, we can select the language for which the collection is optimized. The index uses this language to analyze the documents when indexing, if no other language is specified for the document. This feature enhances the quality of search results for users, as it allows them to use spelling variants, including plurals and inflections, for the search keyword. Portal search uses this language for indexing if there is no language defined for the document.

Portal Search can index content stored in different languages and make it available for search. It uses the unicode setting of the source content to crawl and index content for search. It supplies a choice of tokenizers selectable by administrators: N-gram indexing and linguistic indexing. N-grams are sequences of n consecutive characters in a document. N-grams are generated from a document by sliding a "window" across the text of the document, moving it by one character at a time. N-grams have several advantages over words for use in indexing. First, they are language independent, therefore mixed text can be indexed . They are useful for Asian languages in which word tokenization is more difficult, for example Chinese, Japanese, Korean, and Thai. Linguistic indexing is based on a morphological analyzer that reduces terms to their base. It can be usefully applied in most situations when indexing sources with both English and non-English content.

 

Hints and tips for using Portal Search with different languages

The following hints and tips might be useful if the portal and its users use more than one language:

  1. Set the preferred language of the portal site crawler user ID to match the language of the portal site search collection that it crawls.

  2. If the users use external search services to perform searches in different languages, refer to Using the Search Center with external search services with different languages.

 

Language support by the Portal Search summarizer

The Portal Search summarizer produces summaries for all languages that are supported by the portal. For some languages the summarizer has access to a stemmer program. It uses stems as the base forms for words, as opposed to the lemma forms used by summarizers which have dictionaries. Summaries for these languages can have better quality. Currently the stemmer program is available for the following languages:

Danish
Dutch
English
French
German
Italian
Norwegian
Portuguese
Russian
Spanish
Swedish

 

Parent topic:

Portal Search key features and capabilities