Normalization

 

Normalization allows you to compare strings that contain combining characters.

Data tagged with a UTF-8 or UTF-16 CCSID can contain combining characters. Combining characters allow a resulting character to be composed of more than one character. After the first character of the compound character, one of many different non-spacing characters such as umlauts and accents can follow in the data string. If the resulting character is one that is already defined in the character set, normalization of the string results in multiple combining characters being replaced by the value of the defined character. For example, if your string contained the letter 'a' followed by an '..', the string is normalized to contain the single character 'ä'.

Normalization makes it possible to accurately compare strings. If data is not normalized, two strings that look identical on the display may not compare equal since the stored representation can be different. When UTF-8 and UTF-16 string data is not normalized, it is possible that a column in a table can have one row with the letter 'a' followed by the umlaut character and another row with the combined 'ä' character. These two values are not both compare equal in a comparison predicate: WHERE C1 = 'ä'. For this reason, it is recommended that all string columns in a table are stored in normalized form.

You can normalize the data yourself before inserting or updating it, or you can define a column in a table to be automatically normalized by the database. To have the database perform the normalization, specify NORMALIZED as part of the column definition. This option is only allowed for columns that are tagged with a CCSID of 1208 (UTF-8) or 1200 (UTF-16). The database assumes all columns in a table have been normalized.

The NORMALIZED clause can also be specified for function and procedure parameters. If it is specified for an input parameter, the normalization will be done by the database for the parameter value before invoking the function or procedure. If it is specified for an output parameter, the clause is not enforced; it is assumed that the user's routine code will return a normalized value.

The NORMALIZE_DATA option in the QAQQINI file is used to indicate whether the system is to perform normalization when working with UTF-8 and UTF-16 data. This option controls whether the system will normalize literals, host variables, parameter markers, and expressions that combine strings before using them in SQL. The option is initialized to not perform normalization. This is the correct value for you if the data in your tables and any literal values in your applications is always normalized already through some other mechanism or never contains characters which will need to be normalized. If this is the case, you will want to avoid the overhead of system normalization in your query. If your data is not already normalized, you will want to switch the value of this option to have the system perform normalization for you.

 

Parent topic:

Sort sequences and normalization in SQL

 

Related tasks


Controlling queries dynamically with the query options file QAQQINI