ICU sort sequence
When an International Components for Unicode (ICU) sort sequence table is used, the database uses the language-specific rules to determine the weight of the data based on the locale of the table.
An ICU sort sequence table named en_us (United States locale) can sort data differently than another ICU table named fr_FR (French locale) for example.
The ICU support (5722-SS1 Option 39) properly handles data that is not normalized, producing the same results as if the data were normalized. The ICU sort sequence table can sort all character, graphic, and Unicode (UTF-8, UTF-16 and UCS-2) data.
For example, a UTF-8 character column named NAME contains the following names (the hex values of the column are given as well).
NAME HEX (NAME) Gómez 47C3B36D657A Gomer 476F6D6572 Gumby 47756D6279 A *HEX sort sequence orders the NAME values as follows.
NAME Gomer Gumby Gómez An ICU sort sequence table named en_us correctly orders the NAME values as follows.
NAME Gomer Gómez Gumby When an ICU sort sequence table is specified, the performance of SQL statements that use the table can be much slower than the performance of SQL statements that use a non-ICU sort sequence table or use a *HEX sort sequence. The slower performance results from calling the ICU support to get the weighted value for each piece of data that needs to be sorted. An ICU sort sequence table can provide more sorting function but at the cost of slower running SQL statements. However, indexes created with an ICU sort sequence table can be created over columns to help reduce the need of calling the ICU support. In this case, the index key already contains the ICU weighted value, so there is no need to call the ICU support.
Parent topic:
Sort sequences and normalization in SQL
Related concepts
International Components for Unicode