XML Schema
Overview
There are multiple schema-definition languages, including RELAX NG, Schematron, and, what we will talk about here, the W3C XML Schema standard.
XML Schema is used to validate a documents that contains elements from multiple namespaces.
Validation Process
To be notified of validation errors in an XML document,
- The factory must configured, and the appropriate error handler set.
- The document must be associated with at least one schema, and possibly more.
DocumentBuilder Factory
It's helpful to start by defining the constants you'll use when configuring the factory. (These are same constants you define when using XML Schema for SAX parsing.)
static final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; static final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";Next, you need to configure DocumentBuilderFactory to generate a namespace-aware, validating parser that uses XML Schema:
... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() factory.setNamespaceAware(true); factory.setValidating(true); try { factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); } catch IllegalArgumentException(x) { // Happens if the parser does not support JAXP 1.2 ... }Since JAXP-compliant parsers are not namespace-aware by default, it is necessary to set the property for schema validation to work. You also set a factory attribute specify the parser language to use. (For SAX parsing, on the other hand, you set a property on the parser generated by the factory.)
Associating a Document with a Schema
Now that the program is ready to validate with an XML Schema definition, it is only necessary to ensure that the XML document is associated with (at least) one. There are two ways to do that:
When the application specifies the schema(s) to use, it overrides any schema declarations in the document.
- With a schema declaration in the XML document.
- By specifying the schema(s) to use in the application.
To specify the schema definition in the document, you would create XML like this:
<documentRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='YourSchemaDefinition.xsd' > ...The first attribute defines the XML NameSpace (xmlns) prefix, "xsi", where "xsi" stands for "XML Schema Instance". The second line specifies the schema to use for elements in the document that do not have a namespace prefix -- that is, for the elements you typically define in any simple, uncomplicated XML document. (You'll see how to deal with multiple namespaces in the next section.)
To can also specify the schema file in the application, like this:
static final String schemaSource = "YourSchemaDefinition.xsd"; static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ... factory.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource));Here, too, there are mechanisms at your disposal that will let you specify multiple schemas. We'll take a look at those next.
Validating with Multiple Namespaces
Namespaces let you combine elements that serve different purposes in the same document, without having to worry about overlapping names. The material discussed in this section also applies to validating when using the SAX parser. You're seeing it here, because at this point you've learned enough about namespaces for the discussion to make sense.
To contrive an example, consider an XML data set that keeps track of personnel data. The data set may include information from the w2 tax form, as well as information from the employee's hiring form, with both elements named <form> in their respective schemas.
If a prefix is defined for the "tax" namespace, and another prefix defined for the "hiring" namespace, then the personnel data could include segments like this:
<employee id="..."> <name>....</name> <tax:form> ...w2 tax form data... </tax:form> <hiring:form> ...employment history, etc.... </hiring:form> </employee>The contents of the tax:form element would obviously be different from the contents of the hiring:form, and would have to be validated differently.
Note, too, that there is a "default" namespace in this example, that the unqualified element names employee and name belong to. For the document to be properly validated, the schema for that namespace must be declared, as well as the schemas for the tax and hiring namespaces.
Note: The "default" namespace is actually a specific namespace. It is defined as the "namespace that has no name". So you can't simply use one namespace as your default this week, and another namespace as the default later on. This "unnamed namespace" or "null namespace" is like the number zero. It doesn't have any value, to speak of (no name), but it is still precisely defined. So a namespace that does have a name can never be used as the "default" namespace.
When parsed, each element in the data set will be validated against the appropriate schema, as long as those schemas have been declared. Again, the schemas can either be declared as part of the XML data set, or in the program. (It is also possible to mix the declarations. In general, though, it is a good idea to keep all of the declarations together in one place.)
Declaring the Schemas in the XML Data Set
To declare the schemas to use for the example above in the data set, the XML code would look something like this:
<documentRoot xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="employeeDatabase.xsd" xsi:schemaLocation= "http://www.irs.gov/ fullpath/w2TaxForm.xsd http://www.ourcompany.com/ relpath/hiringForm.xsd" xmlns:tax="http://www.irs.gov/" xmlns:hiring="http://www.ourcompany.com/" > ...The noNamespaceSchemaLocation declaration is something you've seen before, as are the last two entries, which define the namespace prefixes tax and hiring. What's new is the entry in the middle, which defines the locations of the schemas to use for each namespace referenced in the document.
The xsi:schemaLocation declaration consists of entry pairs, where the first entry in each pair is a fully qualified URI that specifies the namespace, and the second entry contains a full path or a relative path to the schema definition. (In general, fully qualified paths are recommended. That way, only one copy of the schema will tend to exist.)
Of particular note is the fact that the namespace prefixes cannot be used when defining the schema locations. The xsi:schemaLocation declaration only understands namespace names, not prefixes.
Declaring the Schemas in the Application
To declare the equivalent schemas in the application, the code would look something like this:
static final String employeeSchema = "employeeDatabase.xsd"; static final String taxSchema = "w2TaxForm.xsd"; static final String hiringSchema = "hiringForm.xsd"; static final String[] schemas = { employeeSchema, taxSchema, hiringSchema, }; static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; ... DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance() ... factory.setAttribute(JAXP_SCHEMA_SOURCE, schemas);Here, the array of strings that points to the schema definitions (.xsd files) is passed as the argument to factory.setAttribute method. Note the differences from when you were declaring the schemas to use as part of the XML data set:
- There is no special declaration for the "default" (unnamed) schema.
- You don't specify the namespace name. Instead, you only give pointers to the .xsd files.
To make the namespace assignments, the parser reads the .xsd files, and finds in them the name of the target namespace they apply to. Since the files are specified with URIs, the parser can use an EntityResolver (if one has been defined) to find a local copy of the schema.
If the schema definition does not define a target namespace, then it applies to the "default" (unnamed, or null) namespace. So, in the example above, you would expect to see these target namespace declarations in the schemas:
- employeeDatabase.xsd -- none
- w2TaxForm.xsd -- http://www.irs.gov/
- hiringForm.xsd -- http://www.ourcompany.com
At this point, you have seen two possible values for the schema source property when invoking the factory.setAttribute() method, a File object in factory.setAttribute(JAXP_SCHEMA_SOURCE, new File(schemaSource)). and an array of strings in factory.setAttribute(JAXP_SCHEMA_SOURCE, schemas). Here is a complete list of the possible values for that argument:
- String that points to the URI of the schema
- InputStream with the contents of the schema
- SAX InputSource
- File
- an array of Objects, each of which is one of the types defined above.
An array of Objects can be used only when the schema language (like http://java.sun.com/xml/jaxp/properties/schemaLanguage) has the ability to assemble a schema at runtime. Also: When an array of Objects is passed it is illegal to have two schemas that share the same namespace.
Further Information
For further information on the TreeModel, see:
- Understanding the TreeModel: http://java.sun.com/products/jfc/tsc/articles/jtree/index.html
For further information on the W3C Document Object Model (DOM), see:
- The DOM standard page: http://www.w3.org/DOM/
For more information on schema-based validation mechanisms, see:
- The W3C standard validation mechanism, XML Schema: http://www.w3c.org/XML/Schema
- RELAX NG's regular-expression based validation mechanism: http://www.oasis-open.org/committees/relax-ng/
- Schematron's assertion-based validation mechansim: http://www.ascc.net/xml/resource/schematron/schematron.html