UTF-8 support for uniform resource locators
There are a number of different encoding methods for transmitting characters outside the printable ASCII range. WebSEAL, acting as a web proxy, must be able to handle all these cases. The UTF-8 locale support addresses this need.
Browsers are limited to a defined character set that can legally be used in a uniform resource locator (URL). This range is defined to be the printable characters in the ASCII character set (between hex code 0x20 and 0x7e). For languages other than English, and other purposes, characters outside the printable ASCII character set are often required in URLs. These characters can be encoded by using printable characters for transmission and interpretation.
The manner in which WebSEAL processes the URLs from browsers can be specified in the WebSEAL configuration file.
[server] utf8-url-support-enabled = {yes|no|auto}
The three possible values are as follows:
- yes
In this mode, WebSEAL recognizes only URI encoded UTF-8 data in URL strings and they are used without modification. These UTF-8 characters are then validated and taken into account when it determines access rights to the URL. WebSEAL supports both raw UTF-8 and URI encoded UTF-8 strings in URLs. In this mode, other encoding techniques are not accepted.
This value is the default and is appropriate for most environments.
Servers that run in a 7-bit ASCII English locale must use this value.
- no
In this mode, WebSEAL does not recognize UTF-8 format data in URL strings. This setting is used for local code page only. If the string can be validated, it is converted to UTF-8 for internal use.
Servers that do not need to process multi-byte input and are running in a single-byte Latin locale, such as French, German, or Spanish, must use this setting.
Use this setting when applications and web servers do not function correctly with WebSEAL if UTF-8 support is enabled. These applications might use DBCS (such as Shift-JIS) or other encoding mechanisms in the URL. When we set this value to no, ensure that all junctioned servers do NOT accept UTF-8 format URLs. It is important from a security perspective, that WebSEAL interprets URLs in the same manner as the junctioned servers.
- auto
WebSEAL attempts to distinguish between UTF-8 and other forms of language character encoding. WebSEAL correctly processes any correctly constructed UTF-8 encoding. If the encoding does not appear to be UTF-8, then the coding is processed as DBCS or Unicode.
If a URL has Unicode in the format "%uHHHH", WebSEAL converts it to UTF-8. The rest of the decoding proceeds as if the configuration setting was yes. If the double-byte-encoding option in the [server] stanza is set to yes, WebSEAL converts %HH%HH to UTF-8.
Servers running in a single-byte Latin locale that need to process multi-byte strings must use the auto setting.
Servers running in a multi-byte locale but that need to support only one language, for example, Japanese can use the auto setting.
The following list is a sample deployment strategy.
- Unless required for content purposes, immediately check and set the default-webseal ACL on existing production deployments to NOT allow unauthenticated r access. This setting limits security exposure to users who have a valid account in the ISAM domain.
- Ensure the utf8-url-support-enabled stanza entry is set to the default value of yes.
- Test your applications. If they function correctly, use this setting.
- If any applications fail with Bad Request errors, try the application with the utf8-url-support-enabled stanza entry set to no. If this step works, we can deploy with this setting. Ensure, however, that no junctioned web server is configured to accept UTF-8 encoded URLs.
- If the application continues to have problems, try setting utf8-url-support-enabled to auto.
Parent topic: Web server configuration
Related concepts
- Content caching
- Communication protocol configuration
- IPv4 and IPv6 overview
- IPv6: Compatibility support
- IP levels for credential attributes
- LDAP directory server configuration
- WebSEAL worker thread configuration
- WebSEAL worker threads
- Global allocation of worker threads for junctions
- Per-junction allocation of worker threads for junctions
- HTTP data compression
- WebSEAL data handling by using UTF-8
- UTF-8 dependency on user registry configuration
- UTF-8 data conversion issues
- UTF-8 impact on authentication
- UTF-8 impact on authorization (dynamic URL)
- Encoding type usage
- UTF-8 support in POST body information (forms)
- UTF-8 support in query strings
- UTF-8 encoding of tokens for cross domain single signon
- UTF-8 encoding of tokens for e-community single signon
- UTF-8 encoding of cookies for failover authentication
- UTF-8 encoding of cookies for LTPA authentication
- UTF-8 encoding in junction requests
- Validation of character encoding in request data
- Set system environment variables
- Cross-Origin Resource Sharing (CORS) Support
Related tasks
- Specify the WebSEAL host name
- Modify the configuration file settings
- Configure WebSEAL for IPv6 and IPv4 requests
Related reference
- IPv6: Upgrade notes
- Allocation view of worker threads for junctions
- Supported wildcard pattern matching characters