home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeJava and XML, 2nd EditionSearch this book

9.3. JAXP 1.1

Late in 2000, the expert group for JAXP 1.1 formed, and work got underway to move JAXP 1.0 to a better, more effective solution for parsing and handling XML documents. As I write this chapter, JAXP 1.1 has just become downloadable in a final form from Sun's web site at http://java.sun.com/xml. Many of the changes to the API center around parsing, which makes sense, given that the "P" in JAXP stands for "parsing." But the most significant changes in JAXP 1.1 center around XML transformations, which I cover in the last part of this chapter. In terms of additions to 1.0 functionality, the changes are fairly minor. The biggest addition is support for SAX 2.0, which went final in May of 2000, and DOM Level 2, which was finalized in November of 2000. Remember that JAXP 1.0 supported only SAX 1.0 and DOM Level 1. This lack of updated standards has been one of the biggest criticisms of JAXP 1.0, and is probably why the 1.1 version has appeared so quickly.

In addition to updating JAXP to the newest versions of SAX and DOM, several small changes have been made in the API feature list. Almost all of these changes are the result of feedback from the various companies and individuals on the JAXP expert group. These changes also all deal with configuring the parsers returned from JAXP's two factories, SAXParserFactory and DocumentBuilderFactory. I cover these now, as well as the update in standards support for SAX and DOM, and then we look at the new TrAX API that is part of JAXP 1.1.

9.3.1. Updating the Standards

The most anticipated change from JAXP 1.0 to 1.1 is the updated support for the SAX and DOM standards. Of critical note is that SAX 2.0 handles namespaces, while SAX 1.0 did not.[14] This namespace support enables the use of numerous other XML vocabularies, such as XML Schema, XLink, and XPointer. While it was possible to use these vocabularies in SAX 1.0, the burden was on the developer to split an element's local (or qualified) name from its namespace, and keep track of namespaces throughout the document. SAX 2.0 provides this information to the developer, dramatically simplifying these programming tasks. The same goes for DOM Level 2: namespace support, as well as a wealth of other methods on the DOM classes, is available.

[14]Careful readers will note that JAXP 1.0 offered namespace processing through the setNamespaceAware( ) methods on SAXParserFactory and DocumentBuilderFactory. The JAXP code had to do this task "by hand" instead of relying on the SAX or DOM APIs. With SAX 2.0 and DOM Level 2, this process is standardized, and therefore much more reliable, as well as cleaner, than the JAXP 1.0 implementation. It's a good thing.

The good news is that these changes are generally transparent to the developer using JAXP. In other words, standards updates happen somewhat "automatically," without user intervention. Simply specifying a SAX 2.0-compliant parser to the SAXParserFactory and a DOM Level 2-compliant parser to the DocumentBuilderFactory class takes care of the update in functionality.

9.3.1.1. The road to SAX 2.0

There are a few significant changes related to these standards updates, particularly with regard to SAX. In SAX 1.0, the parser interface implemented by vendors and XML parser projects was org.xml.sax.Parser. The JAXP class SAXParser, then, provided a method to get this underlying implementation class through the getParser( ) method. The signature for that method looks like this:

public interface SAXParser {

    public org.xml.sax.Parser getParser( );

    // Other methods
}

However, in the change from SAX 1.0 to 2.0, the Parser interface was deprecated and replaced with a new interface, org.xml.sax.XMLReader (the one that you are familiar with from earlier chapters). This made the getParser( ) method useless for obtaining an instance of the SAX 2.0 XMLReader class. To support this new interface, a new method has been added to the JAXP SAXParser class. Not surprisingly, this method is named getXMLReader( ) and looks like:

public interface SAXParser {

    public org.xml.sax.XMLReader getXMLReader( );

    public org.xml.sax.Parser getParser( );

    // Other methods
}

In the same way, JAXP 1.0 used the parse( ) method by supplying an instance of the HandlerBase class (or a subclass, to be more accurate). Of course in SAX 2.0, the HandlerBase class has been replaced by DefaultHandler . To accommodate this change, all of the parse( ) methods on the SAXParser class have been complemented with versions of the same method that take an instance of the DefaultHandler class to support SAX 2.0. To help you see this difference, take a look at Example 9-5, which shows a good chunk of the SAXParser interface.

Example 9-5. The parse( ) methods of the SAXParser interface

public interface SAXParser {

    // The SAX 1.0 parse methods
    public void parse(File file, HandlerBase handlerBase);
    public void parse(InputSource inputSource, HandlerBase handlerBase);
    public void parse(InputStream inputStream, HandlerBase handlerBase);
    public void parse(InputStream inputStream, HandlerBase handlerBase, 
                      String systemID);
    public void parse(String uri, HandlerBase handlerBase);

    // The SAX 2.0 parse methods
    public void parse(File file, DefaultHandler defaultHandler);
    public void parse(InputSource inputSource, 
                      DefaultHandler defaultHandler);
    public void parse(InputStream inputStream, 
                      DefaultHandler defaultHandler);
    public void parse(InputStream inputStream, 
                      DefaultHandler defaultHandler, 
                      String systemID);
    public void parse(String uri, DefaultHandler defaultHandler);

    // Other methods

}

All these methods for parsing may seem a bit confusing, but it's only tricky if you're working with both versions of SAX. If you are using SAX 1.0, you'll be working with the Parser interface and HandlerBase class, and it will be obvious which methods to use. Similarly, when using SAX 2.0, it will be obvious that the methods that accept DefaultHandler instances and return XMLReader instances should be used. So take all this as a reference and don't worry too much about it! There are some other changes to the SAX portion of the API, as well.

9.3.1.2. Changes in SAX classes

To complete the discussion of the changes to existing JAXP functionality, I need to go over a few new methods that are available to JAXP SAX users. First, the SAXParserFactory class has a new method, setFeature( ) . As you recall from JAXP 1.0, the SAXParserFactory class allows configuration of SAXParser instances returned from the factory. In addition to the methods already available in 1.0 (setValidating( ) and setNamespaceAware( )), this new method allows SAX 2.0 features to be requested for new parser instances. For example, a user may request the http://apache.org/xml/features/validation/schema feature, which allows XML Schema validation to be turned on or off. This can now be performed directly on a SAXParserFactory, as shown here:

    SAXParserFactory myFactory = SAXParserFactory.newInstance( );

    // Turn on XML Schema validation
    myFactory.setFeature(
        "http://apache.org/xml/features/validation/schema", true);

    // Now get an instance of the parser with schema validation enabled
    SAXParser parser = myFactory.newSAXParser( );

A getFeature( ) method is provided to complement the setFeature( ) method and allow querying of particular features. This method returns a simple boolean value.

In addition to providing a means to set SAX features (with true or false values), JAXP 1.1 supports the setting of SAX properties (with object values). For example, using an instance of a SAX parser, you could set the property http://xml.org/sax/properties/lexical-handler, assigning that property an implementation of a SAX LexicalHandler interface. Because properties like this lexical one are parser-specific instead of factory-specific (as features were), a setProperty( ) method is provided on the JAXP SAXParser class rather than on the SAXParserFactory class. And as with features, a getProperty( ) complement is provided to return the value associated with a specific property, also on the SAXParser class.

9.3.2. The TrAX API

So far, I've covered the changes to XML parsing in JAXP. Now I can turn to XML transformations in JAXP 1.1. Perhaps the most exciting development in the newest version of Sun's API is that JAXP 1.1 allows vendor-neutral XML document transformations. While this vendor-neutrality may cloud the definition of JAXP as simply a parsing API, it is a much-needed facility since XSL processors currently employ different methods and means for enabling user and developer interaction. In fact, XSL processors have even greater variance across providers than their XML parser counterparts.

Originally, the JAXP expert group sought to provide a simple Transform class with a few methods to allow specification of a stylesheet and subsequent document transformations. This first effort turned out to be rather shaky, but I'm happy to report that we (the JAXP expert group) are going much further in our continued efforts. Scott Boag and Michael Kay, two of the XSL processor gurus (working on Apache Xalan and SAXON, respectively), have worked with many others to develop TrAX, which supports a much wider array of options and features, and provides complete support for almost all XML transformations -- all under the JAXP umbrella. The result is the addition of the javax.xml.transform package, and a few subpackages, to the JAXP API.

Like the parsing portion of JAXP, performing XML transformations requires three basic steps:

  • Obtain a Transformer factory

  • Retrieve a Transformer

  • Perform operations (transformations)

9.3.2.1. Working with the factory

For the transformation portion of JAXP, the factory you will work with is represented by the class javax.xml.transform.TransformerFactory . This class is analogous to the SAXParserFactory and DocumentBuilderFactory classes that I already covered in both the JAXP 1.0 and 1.1 sections. Of course, simply obtaining a factory instance to work with is a piece of cake:

TransformerFactory factory = TransformerFactory.newInstance( );

Nothing special here, just basic factory design principles at work, in conjunction with a singleton pattern.

Once the factory is available, various options can be set upon the factory. Those options will affect all instances of Transformer (which is covered in a minute) created by that factory. You can also obtain instances of javax.xml.transform.Templates through the TransformerFactory. Templates are an advanced JAXP/TrAX concept, and covered at the end of the chapter.

The first of the options you can work with are attributes. These are not XML attributes, but are similar to the properties used in SAX. Attributes allow options to be passed to the underlying XSL processor, which may be Apache Xalan, SAXON, or Oracle's XSL processor (or, theoretically, any TrAX-compliant processor). They are largely vendor-dependent, though. Like the parsing side of JAXP, a setAttribute( ) method is provided as well as a counterpart, getAttribute( ) . Also like setProperty( ), the mutator method (setAttribute( )) takes an attribute name and Object value. And like getProperty( ), the accessor method (getAttribute( )) takes an attribute name and returns the associated Object value.

Setting an ErrorListener is the second option available. Defined in the javax.xml.transform.ErrorListener interface, an ErrorListener allows problems in transformation to be caught and handled programmatically. If this sounds like org.xml.sax.ErrorHandler, it is very similar. Example 9-6 shows this interface.

Example 9-6. The ErrorListener interface

package javax.xml.transform;

public interface ErrorListener {
    public void warning(TransformerException exception)
        throws TransformerException;
    public void error(TransformerException exception)
        throws TransformerException;
    public void fatalError(TransformerException exception)
        throws TransformerException;
}

Creating an implementation of this interface, filling the three callback methods, and using the setErrorListener( ) method on the TransformerFactory instance you are working with sets you up to deal with any errors that occur during transformation.

Finally, a method is provided to set and retrieve the URI resolver for the instances generated by the factory. The interface defined in javax.xml.transform.URIResolver also behaves similarly to a SAX counterpart, org.xml.sax.EntityResolver. The interface has a single method, shown in Example 9-7.

Example 9-7. The URIResolver interface

package javax.xml.transform;

public interface URIResolver {
    public Source resolve(String href, String base)
        throws TransformerException;
}

This interface, when implemented, allows URIs found in XSL constructs like xsl:import and xsl:include to be handled. Returning a Source (which I'll cover in a moment), you can instruct your transformer to search for the specified document in various locations when a particular URI is encountered. For example, when an include of the URI http://www.oreilly.com/oreilly.xsl is encountered, you might instead return the local document alternateOreilly.xsl and prevent the need for network access. Implementations of the URIResolver interface can be set using the TransformerFactory's setURIResolver( ) method, and retrieved using the getURIResolver( ) method.

Finally, once you set the options of your choice, you can obtain an instance, or instances, of a Transformer through the newTransformer( ) method of the factory, as shown here:

    // Get the factory
    TransformerFactory factory = TransformerFactory.newInstance( );

    // Configure the factory
    factory.setErrorResolver(myErrorResolver);
    factory.setURIResolver(myURIResolver);

    // Get a Transformer to work with, with the options specified
    Transformer transformer = 
        factory.newTransformer(new StreamSource("foundation.xsl"));

As you can see, this method takes the stylesheet as input to use in all transformations for that Transformer instance. In other words, if you wanted to transform a document using stylesheet A and stylesheet B, you would need two Transformer instances, one for each stylesheet. If you wanted to transform multiple documents with the same stylesheet (call it stylesheet C), however, you would need only a single Transformer instance, associated with stylesheet C. Don't worry about the StreamSource class; that's coming next.

9.3.2.2. Transforming XML

Once you have an instance of a Transformer, you can go about actually performing XML transformations. This consists of two basic steps:

  • Set the XSL stylesheet to use

  • Perform the transformation, specifying the XML document and result target

As I have demonstrated, the first step is really the easiest. A stylesheet can be supplied when obtaining a Transformer instance from the factory. The location of this stylesheet must be specified by providing a javax.xml.transform.Source instance (actually an instance of an implementation of the Source interface) for its location. The Source interface, which you've seen in a few code samples, is the means of locating an input, be it a stylesheet, document, or other information set. TrAX provides the Source interface and three concrete implementations:

  • javax.xml.transform.stream.StreamSource

  • javax.xml.transform.dom.DOMSource

  • javax.xml.transform.sax.SAXSource

The first of these, StreamSource , reads input from some type of I/O device. Constructors are provided for accepting an InputStream, a Reader, or a String system ID as input. Once created, the StreamSource can be passed to the Transformer for use. This will probably be the Source implementation you use most commonly in programs. It's great for reading a document from a network, input stream, user input, or other static representation of XSL stylesheets.

The next Source implementation, DOMSource, provides for reading from an existing DOM tree. It provides a constructor for taking in a DOM org.w3c.dom.Node, and will read from that Node when used. This is ideal for supplying an existing DOM tree to a transformation, perhaps if parsing has already occurred and an XML document is already in memory as a DOM structure, or if you've built a DOM tree programmatically.

SAXSource provides for reading input from SAX producers. This Source implementation takes either a SAX org.xml.sax.InputSource, or an org.xml.sax.XMLReader as input, and uses the events from these sources. This is ideal for situations in which a SAX content handler is already in use, and callbacks are set up and need to be triggered prior to transformations.

Once you've obtained an instance of a Transformer (by providing the stylesheet to use through an appropriate Source), you're ready to perform a transformation. The transform( ) method is used as shown here:

    // Get the factory
    TransformerFactory factory = TransformerFactory.newInstance( );

    // Configure the factory
    factory.setErrorResolver(myErrorResolver);
    factory.setURIResolver(myURIResolver);

    // Get a Transformer to work with, with the options specified
    Transformer transformer = 
        factory.newTransformer(new StreamSource("foundation.xsl"));

    // Perform transformation on myDocument, and print out result
    transfomer.transform(new StreamSource("asimov.xml"),
                         new StreamResult("results.xml"));

The transform( ) method takes two arguments: a Source implementation, and a javax.xml.transform.Result implementation. You should already be seeing the symmetry in how this works and have an idea about the functionality within the Result interface. The Source provides the XML document to be transformed, and the Result provides an output target for the transformation. Like Source, there are three concrete implementations of the Result interface provided with TrAX and JAXP:

  • javax.xml.transform.stream.StreamResult

  • javax.xml.transform.dom.DOMResult

  • javax.xml.transform.sax.SAXResult

The StreamResult class takes as a construction mechanism either an OutputStream (like System.out for easy debugging!), a Java File, a String system ID, or a Writer. DOMResult takes a DOM Node to output the transformation to (presumably as a DOM org.w3c.dom.Document), and SAXResult takes a SAX ContentHandler instance to fire callbacks to, resulting from the transformed XML. All are analogous to their Source counterparts.

While the previous example shows transforming from a stream to a stream, any combination of sources and results is possible. Here are a few examples:

    // Perform transformation on jordan.xml, and print out result
    transformer.transform(new StreamSource("jordan.xml"),
                         new StreamResult(System.out));

    // Transform from SAX and output results to a DOM Node
    transformer.transform(new SAXSource(
                              new InputSource(
                                  "http://www.oreilly.com/catalog.xml")),
                           new DOMResult(DocumentBuilder.newDocument( )));

    // Transform from DOM and output to a File
    transformer.transform(new DOMSource(domTree),
                          new StreamResult(
                              new FileOutputStream("results.xml")));

    // Use a custom source and result (JDOM)
    transformer.transform(new org.jdom.trax.JDOMSource(myJdomDocument),
                          new org.jdom.trax.JDOMResult(
                              new org.jdom.Document( )));

TrAX provides tremendous flexibility in moving from various input types to various output types, and in using XSL stylesheets in a variety of formats, such as files, in-memory DOM trees, SAX readers, and so on.

9.3.2.3. Odds and ends

Before closing shop on JAXP, there are a few bits and pieces of TrAX I haven't yet talked about. I won't treat these completely, as they are less commonly used, but I will touch on them briefly. First, TrAX introduces an interface called SourceLocator, also in the javax.xml.transform package. This class functions for transformations exactly as the Locator class did for SAX parsing: it supplies information about where action is occurring. Most commonly used for error reporting, the interface looks like this:

package javax.xml.transform;

public interface SourceLocator {
    public int getColumnNumber( );
    public int getLineNumber( );
    public String getPublicId( );
    public String getSystemId( );
}

I won't comment much on this interface, as it's pretty self-explanatory. However, you should know that in the javax.xml.transform.dom package, there is a subinterface called DOMLocator. This interface adds the getOriginatingNode( ) method, which returns the DOM node being processed. This makes error handling quite easy when working with a DOMSource, and is useful in applications that work with DOM trees.

TrAX also provides a concrete class, javax.xml.transform.OutputKeys, which defines several constants for use in output properties for transformations. These constants can then be used for setting properties on a Transformer or a Templates object. That leads me to the last subject dealing with TrAX.

The Templates interface in TrAX is used when a set of output properties is desired across multiple transformations, or when a set of transformation instructions can be used repeatedly. By supplying a Source to a TransformerFactory's newTemplates( ) method, you get an instance of the Templates object:

// Get a factory
TransformerFactory factory = TransformerFactory.newInstance( );

// Get a Templates object
Templates template = factory.newTemplates(new StreamSource("html.xsl"));

At this point, the template object would be a compiled representation of the transformation detailed in html.xsl (in this example, a stylesheet that converts XML to HTML). By using a Templates object, transformations can be performed from this template across threads, and you also get some optimizations, because instructions are precompiled. Once you have gone that far, you need to generate a Transformer, but from the Templates object, rather than the factory:

// Get a transformer
Transformer transformer = template.newTransformer( );

// Transform
transformer.transform(new DOMSource(orderForm), 
                      new StreamResult(res.getOutputStream( )));

Here, there is no need to supply a Source to the newTransformer( ) method, as the transformer is simply a set of (already) compiled instructions. From there, it's business as usual. In this example, a DOM tree that represents an order form is supplied to the transformation, processed using the html.xsl stylesheet, and then sent to a servlet's output stream for display. Pretty slick, huh? As a general rule, if you are going to use a stylesheet more than twice, use a Templates object; it will pay off in performance. Additionally, anytime you are dealing with threads, Templates are the only way to go.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.