Machine translation


This topic discusses machine translation with WebSphere Translation Server and Transcoding Technology.

 

About machine translation

Given the diverse nature of portal content, machine translation is a natural addition to handle content that is not in a user's desired language. Machine translation is automatic translation of human language by computers. For instance, an English-to-German MT system translates English (the source language) into German (the target language). The following language pair translations are available in Translation Server.

  • English to/from French,
  • English to/from Italian,
  • English to/from German,
  • English to/from Spanish,
  • English to/from Brazilian Portuguese
  • English to/from Simplified Chinese
  • English to/from Traditional Chinese
  • English to/from Japanese
  • English to/from Korean

WebSphere Translation Server provides machine translation through two paradigms:

A viewer is a user or reader of a Web page.

Configure machine translation in the portal server enables viewer-initiated and/or viewer-automated translation on a portlet-by-portlet basis through portlet configuration.

Portlets enabled with viewer-initiated translation display an additional icon in the portlet's title bar which the user clicks to display a drop-down list of languages.

The list displays languages based on the language translation engines installed on the Translation Servers. If the viewer chooses a language from the drop-down list, the portlet will be translated and redisplayed to the viewer. ( On devices that do not support drop-down menus, viewer-initiated translation display is not available.)

Portlets enabled with viewer-automated translation translate the portlet based on the viewer's system's preferred language settings. This translation occurs before the portlet is rendered.

If Translation Server exists, but a specific language pair does not exist, portlet content displays in the native language-- no translation is done. If the portlet source already exists in the target language, Translation Server does not translate the content.

Translation Server works with another program, User Dictionary Manager, which is for the Windows operating system only. User Dictionary Manager allows users to create dictionaries to further customize translation. For more information, consult the Translation Server and User Dictionary Manager Information Centers.

 

Using Translation Server with Transcoding Technology

In order to use machine translation with WebSphere Portal, the administrator must:

  • Configure the Transcoding Technology of the WebSphere Portal with Translation Server(s) information. The Transcoding Technology component provides the Machine Translation Plugin which acts as a bridge between WebSphere Portal and Translation Server, passing the portlet's content to Translation Server for translation.
  • Configure individual portlets to specify the translation paradigm, either viewer initiated or viewer automated.

 

Install and configuring

 

Installation

To install Translation Server refer to the documentation provided with Translation Server.

Refer to the Translation Server disc for installation information.

Be sure to add WTS.jar from the Translation Server install directory to the classpath after installing Translation Server. Do not add wts.jar files in the wp_root path to the classpath.

More information can be found on the Web at: http://www.ibm.com/software/pervasive/products/voice/translation_server.shtml

We recommended users install Translation Server on a separate machine from WebSphere Portal, so that WebSphere Portal runs at its peak performance. Users can also install Translation Server on multiple machines, with each machine providing a different pair of language translations. This configuration can further improve the translation performance.

 

Configure Transcoding Technology

  1. After installing the WebSphere Portal and WebSphere Translation Server, modify the WTSserverPortList property in wp_root/config/wpconfig.properties to point to the hostnames, IP addresses, and ports for the installed Translation Servers.

    wp_root is the root install directory of WebSphere Portal.

    The default value of this property is as follows.

    WTSserverPortList=localhost:port

    The value can be a comma-separated list of <hostname>:<port> values for installed Translation Servers.

    Ex: WTSserverPortList=wtshost1.domain1.com:1099,wtshost2.domain2.com:1099

  2. Run ./WPSconfig.sh update-wtp-translation (or WPSconfig.bat update-wtp-translation for Windows) from wp_root/config. This action configures the Transcoding Technology component with the Translation Server hostnames and ports you added in the first step.
  3. Start Translation Server before you start WebSphere Portal.
  4. Start WebSphere Portal.

 

Confirm successful installation and configuration

  1. Configuration messages appear in wp_root\IBMTrans\log\TranscoderMessagesN.log, where N is a number from 1 to 3. As newer messages log from Transcoding, the older messages move to a file marked with an increasing number. Therefore, the most current message is always TranscoderMessages1.log, the oldest message, TranscoderMessages3.log.
  2. Perform searches for the following message numbers in the Transcoding log files:

    Message number Meaning
    TPX6251I Installation and configuration successful.
    TPX6255W Error connecting to a WTS server.
    TPX6257W Translation was disabled.

    If your installation was not successful, refer to the log file for further instruction.

 

Enable portlets for machine translation

  1. Be sure to start or restart WebSphere Portal after you start Translation Server.
  2. From the WebSphere Portal Administration tab, select the portlet you wish to enable for translation and click Modify Parameters.
  3. Enter the following parameters:

Parameter Name Parameter Value Required/Optional Notes
FilterChain Transcoding Required

This parameter allows the portal to route the request/response through the Transcoding Technology before rendering the content. This value should be set only once.

Important Note: For portlets that have the following existing parameter values specified, do not modify them or attempt to add the FilterChain=Transcoding parameter.

  • FilterChain=strutsTranscoding - transcoding for Struts portlets
  • FilterChain=wmlOnlyTranscoding - transcoding for WML devices
  • FilterChain=ChtmlOnlyTranscoding - transcoding for CHTML devices

In these cases, skip setting this parameter completely as these types of portlets go through Transcoding Technology using other mechanisms.

EnableViewerAutomatedTranslation true | false Required This parameter value must be set to True to enable viewer-automated, portlet-level language translation.
EnableViewerInitiatedTranslation true | false Required This parameter value must be set to True to enable viewer-initiated, portlet-level language translation.
EnableTitleTranslation true | false Optional Set this parameter to True will translate the Portlet title to the same language as the content. However, the Portlet must have a title listener for this to take effect.
ContentSubject Ordered list of subject areas supported by Translation Server Optional

This parameter allows the user to specify subject areas separated by commas. These subjects should match the subjects defined in the Translation Server's dictionaries.

The user dictionary needs to be placed in the system dictionary folder of

the respective language and to be loaded using the Translation Server console.

Refer to the Translation Server Information Center for more information about Subject Areas. Refer to the Translation Server and User Dictionary Manager Information Centers for more information about creating, using and loading dictionaries.

PreferredLanguage Language code Optional This parameter allows the portlet developer or portal administrator to provide a default language for translation to be used with viewer-automated translation. When this is specified, the portlet is always translated to this language when rendering.
ContentCharset Portlet's content charset Optional

WebSphere Portal specification requires portlets to provide the content in the encoding that portal specifies, which is by default UTF-8. However, in reality, many portlets provide ISO-8859-1 encoding only (default on Windows platform). As long as the portlet's content does not have special characters (non-ascii), and the language translation does not produce any special characters, the content displays accurately.

However, in the instances that a portlet is using a different encoding then required by WebSphere Portal and the content has special characters or the language translation results in special characters, the content might be chopped off or have non recognizable characters. If a WebSphere Portal administrator sees this behavior, he or she can specify the correct encoding of the portlet which will be used by Transcoding Technology instead of assuming the default from WebSphere Portal.

This parameter should only be used after careful consideration in cases when there is problem displaying parts of translated content.

ContentLanguage Portlet's source content language Required

This parameter sets the language of the source content. For example, ContentLanguage=en sets the portlet source language to English.

Note when using Web Clipping: If the content language of a clipping portlet is different from the local language of WebSphere Portal, set the parameter 'ContentLanguage' to the language of the clipping portlet.

If the parameter is not set to the clipping portlet's language, the wrong drop-down translation menu appears; in other words, the local language of WebSphere Portal, rather than the clipping portlet.

DisableTranslation true | false Optional

This parameter cannot be changed through the Administration GUI. It can only be changed by editing web.xml.

This servlet init parameter can be placed in web.xml by a portlet developer to ensure that a portlet's content will not be machine translated. This parameter will override any other MT related portlet parameters.

Sample contents of web.xml

<?xml version="1.0" encoding="UTF-8"?>

...

<web-app id="WebApp">

<display-name>TranslationTestPortlet</display-name>

<servlet id="Servlet_1">

<servlet-name>MyPortlet</servlet-name>

<display-name>MyPortlet</display-name>

<servlet-class>portlet.MyPortlet</servlet-class>

<init-param>

<param-name>DisableTranslation</param-name>

<param-value>true</param-value>

</init-param>

</servlet>

.......

.......

</web-app>

  1. If your portlet is written in a language other than English:

    1. Set the value of the parameter ContentCharset in portlet.xml to the charset of the source language.

      Example:

      <config-param>

      <param-name>ContentCharset</param-name>

      <param-value>gb2312</param-value>

      </config-param>

    2. Set the value of the parameter charset in first line of view.jap to the charset of the source language.

      Example:

      <%@ page contentType="text/html; charset=gb2312" %>

      <jsp:useBean id="MyPortletBean" class="portlet.MyPortletBean" scope="request" />

    3. If the title parameter (title of the portlet) in portlet.xml has a charset other than IS0-8859-1 or UTF-8 (in other words, a double-byte charset), then the encoding parameter in the first line in portlet.xml should indicate that charset of the title.

      Example:

      <?xml version="1.0" encoding="gb2312"?>

      <!DOCTYPE portlet-app-def PUBLIC "-//IBM//DTD Portlet Application 1.1//EN" "portlet_1.1.dtd">

      ...........

      </language>-->

      <language locale="zh">

      <title>???????????????? - </title>

      <title-short></title-short>

      <description></description>

      <keywords></keywords>

      </language>

  2. Click Save to save your changes, or Cancel to cancel.

 

Language Selection Rules

The following rules detail how WebSphere Portal selects the target language to be passed to Translation Server for translating a portlet's contents.

Any HTML content enclosed in the <pre> tag will not be translated by Translation Server.

 

If viewer-initiated translation is enabled

  1. If a PreferredLanguage parameter is present on the Portlet Request, the target language is the value of this parameter. This parameter will be present on the request when the user selects a language from a language selection drop-down created by the Viewer-Initiated option. The language selection from viewer-initiated translation overrides all other target language selections.
  2. If a PreferredLanguage parameter is specified on the portlet, the target language is the value of this parameter. The portal administrator or developer can optionally set this at design time.
  3. If the viewer has logged in, the target language is the preferred language specified by the viewer in the WebSphere Portal settings.
  4. If no viewer language can be found (anonymous users), the target language is the one defined in the viewer's browser.
  5. If no browser language can be found, for example, if the browser used does not send a language, the target language is as defined in the portal's global settings.

When a portlet is first rendered, rule 1 does not apply and the portlet's target language for translation will be automatically selected by rules 2 through 4 if viewer-automated translation is also enabled. Otherwise, the portlet renders using its default language when it is first rendered.

 

If viewer-automated translation is enabled

  1. If a PreferredLanguage parameter is specified in the Portlet Settings, the target language is the value of this parameter. The portal administrator or developer can optionally set this at design time.
  2. If the viewer has logged in, the target language is the preferred language specified by the viewer in WebSphere Portal settings.
  3. If no viewer language can be found (anonymous users), the target language is the one defined in the viewer's browser.
  4. If no browser language can be found, for example, if the browser used does not send a language, the target language is as defined in the portal's global settings.

 

Note: Translating WML and CHTML pages

Portlets for WML and CHTML devices support viewer-automated translation only.

 

See also

Home |

 

WebSphere is a trademark of the IBM Corporation in the United States, other countries, or both.

 

IBM is a trademark of the IBM Corporation in the United States, other countries, or both.