JAXP
Overview
The Java API for XML Processing is used for processing XML data. JAXP can parse data as a stream of events (SAX) or as an object tree (DOM).
JAXP supports XSLT for conversion to other formats, such as HTML.
JAXP provides namespace support, allowing one to work with DTDs that might otherwise have naming conflicts.
The main JAXP APIs are defined in the javax.xml.parsers package. That package contains two vendor-neutral factory classes: SAXParserFactory and DocumentBuilderFactory that give you a SAXParser and a DocumentBuilder, respectively. The DocumentBuilder, in turn, creates DOM-compliant Document object.
The factory APIs give you the ability to plug in an XML implementation offered by another vendor without changing your source code. The implementation you get depends on the setting of the javax.xml.parsers.SAXParserFactory and javax.xml.parsers.DocumentBuilderFactory system properties. The default values (unless overridden at runtime) point to the reference implementation.
See Also:
Packages
The SAX and DOM APIs are defined by XML-DEV group and by the W3C, respectively. The libraries that define those APIs are:
- javax.xml.parsers
- The JAXP APIs, which provide a common interface for different vendors' SAX and DOM parsers.
- org.w3c.dom
- Defines the Document class (a DOM), as well as classes for all of the components of a DOM.
- org.xml.sax
- Defines the basic SAX APIs.
- javax.xml.transform
- Defines the XSLT APIs that let you transform XML into other forms.
The "Simple API" for XML (SAX) is the event-driven, serial-access mechanism that does element-by-element processing. The API for this level reads and writes XML to a data repository or the Web. For server-side and high-performance apps, you will want to fully understand this level. But for many applications, a minimal understanding will suffice.
The DOM API is generally an easier API to use. It provides a relatively familiar tree structure of objects. You can use the DOM API to manipulate the hierarchy of application objects it encapsulates. The DOM API is ideal for interactive applications because the entire object model is present in memory, where it can be accessed and manipulated by the user.
On the other hand, constructing the DOM requires reading the entire XML structure and holding the object tree in memory, so it is much more CPU and memory intensive. For that reason, the SAX API will tend to be preferred for server-side applications and data filters that do not require an in-memory representation of the data.
Finally, the XSLT APIs defined in javax.xml.transform let you write XML data to a file or convert it into other forms. And, as you'll see in the XSLT section, of this tutorial, you can even use it in conjunction with the SAX APIs to convert legacy data to XML.
Simple API for XML (SAX)
Here is a summary of the key SAX APIs:
- SAXParserFactory
- A SAXParserFactory object creates an instance of the parser determined by the system property, javax.xml.parsers.SAXParserFactory.
- SAXParser
- The SAXParser interface defines several kinds of parse() methods. In general, you pass an XML data source and a DefaultHandler object to the parser, which processes the XML and invokes the appropriate methods in the handler object.
- SAXReader
- The SAXParser wraps a SAXReader. Typically, you don't care about that, but every once in a while you need to get hold of it using SAXParser's getXMLReader(), so you can configure it. It is the SAXReader which carries on the conversation with the SAX event handlers you define.
- DefaultHandler
- Not shown in the diagram, a DefaultHandler implements the ContentHandler, ErrorHandler, DTDHandler, and EntityResolver interfaces (with null methods), so you can override only the ones you're interested in.
- ContentHandler
- Methods like startDocument, endDocument, startElement, and endElement are invoked when an XML tag is recognized. This interface also defines methods characters and processingInstruction, which are invoked when the parser encounters the text in an XML element or an inline processing instruction, respectively.
- ErrorHandler
- Methods error, fatalError, and warning are invoked in response to various parsing errors. The default error handler throws an exception for fatal errors and ignores other errors (including validation errors). That's one reason you need to know something about the SAX parser, even if you are using the DOM. Sometimes, the application may be able to recover from a validation error. Other times, it may need to generate an exception. To ensure the correct handling, you'll need to supply your own error handler to the parser.
- DTDHandler
- Defines methods you will generally never be called upon to use. Used when processing a DTD to recognize and act on declarations for an unparsed entity.
- EntityResolver
- The resolveEntity method is invoked when the parser must identify data identified by a URI. In most cases, a URI is simply a URL, which specifies the location of a document, but in some cases the document may be identified by a URN--a public identifier, or name, that is unique in the Web space. The public identifier may be specified in addition to the URL. The EntityResolver can then use the public identifier instead of the URL to find the document, for example to access a local copy of the document if one exists.
A typical application implements most of the ContentHandler methods, at a minimum. Since the default implementations of the interfaces ignore all inputs except for fatal errors, a robust implementation may want to implement the ErrorHandler methods, as well.
The SAX Packages
The SAX parser is defined in the following packages listed in Table 5-1.
Package Description org.xml.sax Defines the SAX interfaces. The name org.xml is the package prefix that was settled on by the group that defined the SAX API. org.xml.sax.ext Defines SAX extensions that are used when doing more sophisticated SAX processing, for example, to process a document type definitions (DTD) or to see the detailed syntax for a file. org.xml.sax.helpers Contains helper classes that make it easier to use SAX--for example, by defining a default handler that has null-methods for all of the interfaces, so you only need to override the ones you actually want to implement. javax.xml.parsers Defines the SAXParserFactory class which returns the SAXParser. Also defines exception classes for reporting errors.
Document Object Model (DOM)
A DOM is a tree-structure representation of an XML file. The DocumentBuilderFactory class is used to get a DocumentBuilder instance, which is used to produce Documents (DOMs). When combined with Swing, clickable tree-structures can be built.
The newDocument() method in DocumentBuilder is used to create empty Document interface.
Although they are called objects, the entries in the DOM tree are actually fairly low-level data structures. For a truly object-oriented tree, see the JDOM API at http://www.jdom.org.
XML Stylesheet Language for Transformation (XSLT)
A TransformerFactory object is instantiated, and used to create a Transformer. The source object is the input to the transformation process. A source object can be created from SAX reader, from a DOM, or from an input stream.
Similarly, the result object is the result of the transformation process. That object can be a SAX event handler, a DOM, or an output stream.
When the transformer is created, it may be created from a set of transformation instructions, in which case the specified transformations are carried out. If it is created without any specific instructions, then the transformer object simply copies the source to the result.
The XSLT Packages
The XSLT APIs are defined in the following packages:
Package Description javax.xml.transform Defines the TransformerFactory and Transformer classes, which you use to get a object capable of doing transformations. After creating a transformer object, you invoke its transform() method, providing it with an input (source) and output (result). javax.xml.transform.dom Classes to create input (source) and output (result) objects from a DOM. javax.xml.transform.sax Classes to create input (source) from a SAX parser and output (result) objects from a SAX event handler. javax.xml.transform.stream Classes to create input (source) and output (result) objects from an I/O stream.
JAXP Version 1.2 JAR Files
JAXP API Version 1.2 consists of the following JAR files:
jaxp-api.jar top>javax.xml.parsers and javax.xml.transform components. As of Java 2 Version 1.4, those classes are built into the platform. sax.jar SAX APIs and helper classes dom.jar DOM APIs and helper classes xercesImpl.jar SAX and DOM parsers. Also contains Xerces-specific implementations of the JAXP APIs. xalan.jar The "classic" Xalan XSLT processor. xsltc.jar The Xalan Compiling XSLT processor. If you are using the Java WSDP, the JAXP libraries are distributed in the directory $JWSDP_HOME/common/lib.
General Installation
If you used the Web Services installation process to do the update, then you generally do not have to do any additional work to configure the JAXP JAR files. If you want to update individual JAR files, copy them to:
$JWSDP_HOME/common/endorsedAlternatively, you could place the JAR files in the platform extensions directory.
Installation with Tomcat version 4.x
- Remove
$TOMCAT_HOME/common/lib/xerces.jar- If you have version 1.4 of the Java 2 SDK copy all of the JAR files, except jaxp-api.jar, to...
$TOMCAT_HOME/common/lib/..and in Tomcat, set java.endorsed.dirs to...
$TOMCAT_HOME/common/lib/If you have version 1.3 of the Java 2 SDK copy all 6 JAR files, including jaxp-api.jar, to...
$TOMCAT_HOME/common/lib/- Set the lib directory in your CLASSPATH:
CLASSPATH=$TOMCAT_HOME/common/lib/:$CLASSPATH
XML Parsing
Starting with JAXP RI 1.2.0, the parser implementation changed from the Apache Crimson parser to Apache Xerces Java 2.
To be notified of validation errors in an XML document, these items must be true:
- The document must be associated with a schema.
- Validation must be turned on using javax.xml.parsers.DocumentBuilderFactory or javax.xml.parsers.SAXParserFactory.
To validate with a W3C XML Schema, also set a SAX property or DOM attribute.
- An application-defined ErrorHandler must be set using the setErrorHandler methods of javax.xml.parsers.DocumentBuilder or org.xml.sax.XMLReader.
XSLT Processing
The JAXP RI contains 2 XSLT engines that are part of the Xalan implementation.
- The classic Xalan XSLT Processor, xalan-j 2.3.1_01, is the default XSLT parsing engine for the JAXP transform package.
- The Xalan Compiling Processor (XSLTC) generates a transformation engine, or translet, from an XSL stylesheet. This approach separates the interpretation of stylesheet instructions from their runtime application to XML data.
XSLTC works by compiling a stylesheet into Java byte code (translets), which can then be used to perform XSLT transformations. This approach greatly improves the performance of XSLT transformations where a given stylesheet is compiled once and used many times. It also generates an extremely lightweight translet, because only the XSLT instructions that are actually used by the stylesheet are included. To direct an application to use the XSLT engine in XSLTC, set the TransformerFactory property as follows:
javax.xml.transform.TransformerFactory=org.apache.xalan.xsltc.trax.TransformerFactoryImpl
See also: javax.xml.transform
Smart Transformer Switch
The JAXP transformation API includes a "Smart Transformer Switch" which enables automatic switching between Xalan and XSLTC processors within your application. It uses Xalan to create your Transformer objects, and uses XSLTC to create your Templates objects.
To use the switch, set:
javax.xml.transform.TransformerFactory=org.apache.xalan.xsltc.trax.SmartTransformerImpl