ICU sort sequence

 

When an International Components for Unicode (ICU) sort sequence table is used, the database uses the language-specific rules to determine the weight of the data based on the locale of the table.

An ICU sort sequence table named en_us (United States locale) can sort data differently than another ICU table named fr_FR (French locale) for example.

The ICU support (5722-SS1 Option 39) properly handles data that is not normalized, producing the same results as if the data were normalized. The ICU sort sequence table can sort all character, graphic, and Unicode (UTF-8, UTF-16 and UCS-2) data.

For example, a UTF-8 character column named NAME contains the following names (the hex values of the column are given as well).

NAME HEX (NAME)
Gómez 47C3B36D657A
Gomer 476F6D6572
Gumby 47756D6279

A *HEX sort sequence orders the NAME values as follows.

NAME
Gomer
Gumby
Gómez

An ICU sort sequence table named en_us correctly orders the NAME values as follows.

NAME
Gomer
Gómez
Gumby

When an ICU sort sequence table is specified, the performance of SQL statements that use the table can be much slower than the performance of SQL statements that use a non-ICU sort sequence table or use a *HEX sort sequence. The slower performance results from calling the ICU support to get the weighted value for each piece of data that needs to be sorted. An ICU sort sequence table can provide more sorting function but at the cost of slower running SQL statements. However, indexes created with an ICU sort sequence table can be created over columns to help reduce the need of calling the ICU support. In this case, the index key already contains the ICU weighted value, so there is no need to call the ICU support.

 

Parent topic:

Sort sequences and normalization in SQL

 

Related concepts


International Components for Unicode