System i: Double-byte character set fundamentals

Double-byte character set fundamentals

Some languages, such as Chinese, Japanese, and Korean, have a writing scheme that uses many different characters that cannot be represented with single-byte codes. To create coded character sets for such languages, the system uses 2 bytes to represent each character. Characters that are encoded in 2-byte code are called double-byte characters.
Figure 1 shows alphanumeric characters coded in a single-byte code scheme and double-byte characters coded in a double-byte code scheme.
You can use double-byte characters as well as single-byte characters in one application. For instance, you might want to store double-byte data and single-byte data in your database, create your display screens with double-byte text and fields, or print reports with double-byte characters. Figure 1. Single-byte and double-byte code schemes

DBCS code scheme
IBM^® supports two DBCS code schemes: one for the host systems, the other for personal computers.

Shift-control double-byte characters
When the IBM-host code scheme is used, the system uses shift-control characters to identify the beginning and end of a string of double-byte characters.

Invalid double-byte code and undefined double-byte code
Invalid double-byte code has a double-byte code value that is not in the valid double-byte code range.

Usage of double-byte data
This section describes where you can use double-byte data and the limitations to its use.

Double-byte character size
When displayed or printed, double-byte characters typically are twice as wide as single-byte characters.

Parent topic:
Double-byte character set support