Work with collations
XSLT stylesheets and expressions in XQuery and XPath can refer to collations using collation URIs. A collation is a set of culture-specific rules that define how text should be sorted and which differences between two pieces of text are considered significant and which insignificant.
This article assumes some basic familiarity with the java.util.Locale and java.text.Collator classes.
The processor does not interpret the collation URI in any way -- it treats a collation URI merely as a sort of name for the instance of the Java Collator class associated with that URI. The XML API provides mechanisms for specifying what will be the default collation URI at preparation-time and for associating an instance of the Java Collator class with a collation URI at execution-time.
All collation URIs specified through the XML API must be absolute URI references. In an XSLT stylesheet or an XQuery or XPath expression, any relative URI reference used in a context where a collation URI is required will be resolved against the base URI from the static context for that expression -- that will ensure that even relative URI references in the stylesheet or expression can be matched with the absolute URI references specified through the XML API.
Limitations:
- If a collation URI is bound with an instance of the Java Collator class that is not an instance of java.text.RuleBasedCollator, certain operations will not be permitted with that collation URI. In particular, the fn:starts-with, fn:ends-with, fn:contains, fn:substring-before and fn:substring-after functions are not supported with that collation URI.
- All instances of Collator that are currently included with the Java runtime environment are also instances of java.text.RuleBasedCollator, so this is for most purposes only a theoretical limitation. However, it is something to be aware of if an application defines its own instances of the Java Collator class or defines subclasses the Collator class that are not also instances of java.text.RuleBasedCollator.
Tasks
- Declare the default collation URI.
We can specify what collation URI we want to use as the default for string comparison operations using the method setDefaultCollation method on the XStaticContext interface. The default collation URI from the XStaticContext interface will be used as the collation URI in string comparison operations that do not explicitly specify a collation URI.
An XQuery expression can override the default collation URI specified on the XStaticContext interface with the declare default collation declaration. Similarly, an XSLT stylesheet can override the default collation URI with the [xsl:]default-collation attribute. XPath does not provide a means of overriding the default collation URI. However, any XPath or XQuery expression or XSLT stylesheet that performs string comparison operations can specify an explicit collation URI to override the default collation URI.
If we do not explicitly specify a default collation on any instance of the XStaticContext interface you supply when you prepare your XSLT stylesheet or your XQuery or XPath expression, the default collation URI for the stylesheet or expression will be the Unicode code-point collation URI: http://www.w3.org/2005/xpath-functions/collation/codepoint/.
You can use the Unicode code-point collation in situations where characters must be identical Unicode characters to be considered to be equal. The lexicographical ordering defined by this collation is determined by the Unicode code points of the characters -- that is, by their positions on the Unicode code charts. As such, using the Unicode code-point collation will yield much better performance than collations that perform string comparisons in a culture-specific manner, but its unlikely to give very satisfactory results for sorting operations.
The following is a simple example showing how to specify the default collation URI on an instance of the XStaticContext interface.
// Setting of default collation URI is not changed - default remains // the Unicode code point collation URI XFactory factory = XFactory.newInstance(); XPathExecutable maxPath1 = factory.prepareXPath("max($var)"); // A new default collation URI is specified in the static context // That URI is used in any string comparison for which no other // explicit collation URI is specified XStaticContext sc = factory.newStaticContext(); sc.setDefaultCollation("http://example.org/my-collation"); XPathExecutable maxPath2 = factory.prepareXPath("max($var)", sc);- Bind a collation URI.
The XML API provides two methods for binding collation URI with an instance of the Java Collator class for an execution. The bindCollation methods on the XDynamicContext method have two arguments: the first argument is a collation URI; the second is either instance of the java.text.Collator class or an instance of the java.util.Locale class. If an instance of the locale class is specified, the processor will use the instance of the Collator class that is appropriate for that locale.
XSLT, XPath and XQuery define the concept of Statically Known Collations. If a reference to a collation URI appears in an XSLT stylesheet or an XPath or XQuery expression, and the collation URI is not one of the Statically Known Collations, a static error is supposed to be reported in some circumstances. However, the processor treats all collation URIs as if they were in the set of Statically Known Collations. This is due to the fact that instances of the Java Collator class are not actually associated with collation URIs until execution time, so it is not possible for the processor to determine statically which collation URIs are not known. Instead, the processor will report a dynamic error if a collation URI that is not bound to an instance of the Collator class is used in a stylesheet or expression.
You cannot bind the Unicode code-point collation URI to any instance of the Java Collator class. It is always implicitly bound with the Unicode code-point collation.
The following example demonstrates how we can bind a collation URI with a specific instance of the Java Collator class on an instance of the XDynamicContext interface.
XFactory factory = XFactory.newInstance(); XStaticContext sc = factory.newStaticContext(); // Set up a default collation URI sc.setDefaultCollation("http://example.org/my-collation"); // Prepare an XPath expression that computes fn:max() using the // collator associated with the default collation URI and again using // the Unicode code point collation String expr = "max($var)," + "max($var,'http://www.w3.org/2005/xpath-functions/collation/codepoint')"; XPathExecutable maxPath = factory.prepareXPath(expr, sc); XDynamicContext dc = factory.newDynamicContext(); // Set the value of the variable $var dc.bind(new QName("var"), new String[] {"encyclopaedia", // U+00E6 is lower case latin ae ligature "encyclop\u00E6dia", "encyclopedia"}); // Set up a Collator for English that does not distinguish between // capitals, lower-case letters and certain character variants Collator english = (Collator) Collator.getInstance(Locale.ENGLISH).clone(); english.setStrength(Collator.SECONDARY); // Evaluate the expression with that English collator associated with // the default collation URI dc.bindCollation("http://example.org/my-collation", english); XSequenceCursor maxValues = maxPath.execute(dc); // Print maximum values - expected results are // encyclopedia for English collation and // encyclop\u00E6dia for Unicode code point collation if (maxValues != null) { do { System.out.println(maxValues.getStringValue()); } while (maxValues.toNext()); }