JAXP 1.1 (Java & XML, 2nd Edition)

9.3.2. The TrAX API

So far, I've covered the changes to XML parsing in JAXP. Now I can turn to XML transformations in JAXP 1.1. Perhaps the most exciting development in the newest version of Sun's API is that JAXP 1.1 allows vendor-neutral XML document transformations. While this vendor-neutrality may cloud the definition of JAXP as simply a parsing API, it is a much-needed facility since XSL processors currently employ different methods and means for enabling user and developer interaction. In fact, XSL processors have even greater variance across providers than their XML parser counterparts.

Originally, the JAXP expert group sought to provide a simple Transform class with a few methods to allow specification of a stylesheet and subsequent document transformations. This first effort turned out to be rather shaky, but I'm happy to report that we (the JAXP expert group) are going much further in our continued efforts. Scott Boag and Michael Kay, two of the XSL processor gurus (working on Apache Xalan and SAXON, respectively), have worked with many others to develop TrAX, which supports a much wider array of options and features, and provides complete support for almost all XML transformations -- all under the JAXP umbrella. The result is the addition of the javax.xml.transform package, and a few subpackages, to the JAXP API.

Like the parsing portion of JAXP, performing XML transformations requires three basic steps:

Obtain a Transformer factory
Retrieve a Transformer
Perform operations (transformations)

9.3.2.1. Working with the factory

For the transformation portion of JAXP, the factory you will work with is represented by the class javax.xml.transform.TransformerFactory . This class is analogous to the SAXParserFactory and DocumentBuilderFactory classes that I already covered in both the JAXP 1.0 and 1.1 sections. Of course, simply obtaining a factory instance to work with is a piece of cake:

TransformerFactory factory = TransformerFactory.newInstance( );

Nothing special here, just basic factory design principles at work, in conjunction with a singleton pattern.

Once the factory is available, various options can be set upon the factory. Those options will affect all instances of Transformer (which is covered in a minute) created by that factory. You can also obtain instances of javax.xml.transform.Templates through the TransformerFactory. Templates are an advanced JAXP/TrAX concept, and covered at the end of the chapter.

The first of the options you can work with are attributes. These are not XML attributes, but are similar to the properties used in SAX. Attributes allow options to be passed to the underlying XSL processor, which may be Apache Xalan, SAXON, or Oracle's XSL processor (or, theoretically, any TrAX-compliant processor). They are largely vendor-dependent, though. Like the parsing side of JAXP, a setAttribute( ) method is provided as well as a counterpart, getAttribute( ) . Also like setProperty( ), the mutator method (setAttribute( )) takes an attribute name and Object value. And like getProperty( ), the accessor method (getAttribute( )) takes an attribute name and returns the associated Object value.

Setting an ErrorListener is the second option available. Defined in the javax.xml.transform.ErrorListener interface, an ErrorListener allows problems in transformation to be caught and handled programmatically. If this sounds like org.xml.sax.ErrorHandler, it is very similar. Example 9-6 shows this interface.

Example 9-6. The ErrorListener interface

package javax.xml.transform;

public interface ErrorListener {
    public void warning(TransformerException exception)
        throws TransformerException;
    public void error(TransformerException exception)
        throws TransformerException;
    public void fatalError(TransformerException exception)
        throws TransformerException;
}

Creating an implementation of this interface, filling the three callback methods, and using the setErrorListener( ) method on the TransformerFactory instance you are working with sets you up to deal with any errors that occur during transformation.

Finally, a method is provided to set and retrieve the URI resolver for the instances generated by the factory. The interface defined in javax.xml.transform.URIResolver also behaves similarly to a SAX counterpart, org.xml.sax.EntityResolver. The interface has a single method, shown in Example 9-7.

Example 9-7. The URIResolver interface

package javax.xml.transform;

public interface URIResolver {
    public Source resolve(String href, String base)
        throws TransformerException;
}

This interface, when implemented, allows URIs found in XSL constructs like xsl:import and xsl:include to be handled. Returning a Source (which I'll cover in a moment), you can instruct your transformer to search for the specified document in various locations when a particular URI is encountered. For example, when an include of the URI http://www.oreilly.com/oreilly.xsl is encountered, you might instead return the local document alternateOreilly.xsl and prevent the need for network access. Implementations of the URIResolver interface can be set using the TransformerFactory's setURIResolver( ) method, and retrieved using the getURIResolver( ) method.

Finally, once you set the options of your choice, you can obtain an instance, or instances, of a Transformer through the newTransformer( ) method of the factory, as shown here:

    // Get the factory
    TransformerFactory factory = TransformerFactory.newInstance( );

    // Configure the factory
    factory.setErrorResolver(myErrorResolver);
    factory.setURIResolver(myURIResolver);

    // Get a Transformer to work with, with the options specified
    Transformer transformer = 
        factory.newTransformer(new StreamSource("foundation.xsl"));

As you can see, this method takes the stylesheet as input to use in all transformations for that Transformer instance. In other words, if you wanted to transform a document using stylesheet A and stylesheet B, you would need two Transformer instances, one for each stylesheet. If you wanted to transform multiple documents with the same stylesheet (call it stylesheet C), however, you would need only a single Transformer instance, associated with stylesheet C. Don't worry about the StreamSource class; that's coming next.

9.3.2.2. Transforming XML

Once you have an instance of a Transformer, you can go about actually performing XML transformations. This consists of two basic steps:

Set the XSL stylesheet to use
Perform the transformation, specifying the XML document and result target

As I have demonstrated, the first step is really the easiest. A stylesheet can be supplied when obtaining a Transformer instance from the factory. The location of this stylesheet must be specified by providing a javax.xml.transform.Source instance (actually an instance of an implementation of the Source interface) for its location. The Source interface, which you've seen in a few code samples, is the means of locating an input, be it a stylesheet, document, or other information set. TrAX provides the Source interface and three concrete implementations:

javax.xml.transform.stream.StreamSource
javax.xml.transform.dom.DOMSource
javax.xml.transform.sax.SAXSource

The first of these, StreamSource , reads input from some type of I/O device. Constructors are provided for accepting an InputStream, a Reader, or a String system ID as input. Once created, the StreamSource can be passed to the Transformer for use. This will probably be the Source implementation you use most commonly in programs. It's great for reading a document from a network, input stream, user input, or other static representation of XSL stylesheets.

The next Source implementation, DOMSource, provides for reading from an existing DOM tree. It provides a constructor for taking in a DOM org.w3c.dom.Node, and will read from that Node when used. This is ideal for supplying an existing DOM tree to a transformation, perhaps if parsing has already occurred and an XML document is already in memory as a DOM structure, or if you've built a DOM tree programmatically.

SAXSource provides for reading input from SAX producers. This Source implementation takes either a SAX org.xml.sax.InputSource, or an org.xml.sax.XMLReader as input, and uses the events from these sources. This is ideal for situations in which a SAX content handler is already in use, and callbacks are set up and need to be triggered prior to transformations.

Once you've obtained an instance of a Transformer (by providing the stylesheet to use through an appropriate Source), you're ready to perform a transformation. The transform( ) method is used as shown here:

    // Get the factory
    TransformerFactory factory = TransformerFactory.newInstance( );

    // Configure the factory
    factory.setErrorResolver(myErrorResolver);
    factory.setURIResolver(myURIResolver);

    // Get a Transformer to work with, with the options specified
    Transformer transformer = 
        factory.newTransformer(new StreamSource("foundation.xsl"));

    // Perform transformation on myDocument, and print out result
    transfomer.transform(new StreamSource("asimov.xml"),
                         new StreamResult("results.xml"));

The transform( ) method takes two arguments: a Source implementation, and a javax.xml.transform.Result implementation. You should already be seeing the symmetry in how this works and have an idea about the functionality within the Result interface. The Source provides the XML document to be transformed, and the Result provides an output target for the transformation. Like Source, there are three concrete implementations of the Result interface provided with TrAX and JAXP:

javax.xml.transform.stream.StreamResult
javax.xml.transform.dom.DOMResult
javax.xml.transform.sax.SAXResult

The StreamResult class takes as a construction mechanism either an OutputStream (like System.out for easy debugging!), a Java File, a String system ID, or a Writer. DOMResult takes a DOM Node to output the transformation to (presumably as a DOM org.w3c.dom.Document), and SAXResult takes a SAX ContentHandler instance to fire callbacks to, resulting from the transformed XML. All are analogous to their Source counterparts.

While the previous example shows transforming from a stream to a stream, any combination of sources and results is possible. Here are a few examples:

    // Perform transformation on jordan.xml, and print out result
    transformer.transform(new StreamSource("jordan.xml"),
                         new StreamResult(System.out));

    // Transform from SAX and output results to a DOM Node
    transformer.transform(new SAXSource(
                              new InputSource(
                                  "http://www.oreilly.com/catalog.xml")),
                           new DOMResult(DocumentBuilder.newDocument( )));

    // Transform from DOM and output to a File
    transformer.transform(new DOMSource(domTree),
                          new StreamResult(
                              new FileOutputStream("results.xml")));

    // Use a custom source and result (JDOM)
    transformer.transform(new org.jdom.trax.JDOMSource(myJdomDocument),
                          new org.jdom.trax.JDOMResult(
                              new org.jdom.Document( )));

TrAX provides tremendous flexibility in moving from various input types to various output types, and in using XSL stylesheets in a variety of formats, such as files, in-memory DOM trees, SAX readers, and so on.

9.3.2.3. Odds and ends

Before closing shop on JAXP, there are a few bits and pieces of TrAX I haven't yet talked about. I won't treat these completely, as they are less commonly used, but I will touch on them briefly. First, TrAX introduces an interface called SourceLocator, also in the javax.xml.transform package. This class functions for transformations exactly as the Locator class did for SAX parsing: it supplies information about where action is occurring. Most commonly used for error reporting, the interface looks like this:

package javax.xml.transform;

public interface SourceLocator {
    public int getColumnNumber( );
    public int getLineNumber( );
    public String getPublicId( );
    public String getSystemId( );
}

I won't comment much on this interface, as it's pretty self-explanatory. However, you should know that in the javax.xml.transform.dom package, there is a subinterface called DOMLocator. This interface adds the getOriginatingNode( ) method, which returns the DOM node being processed. This makes error handling quite easy when working with a DOMSource, and is useful in applications that work with DOM trees.

TrAX also provides a concrete class, javax.xml.transform.OutputKeys, which defines several constants for use in output properties for transformations. These constants can then be used for setting properties on a Transformer or a Templates object. That leads me to the last subject dealing with TrAX.

The Templates interface in TrAX is used when a set of output properties is desired across multiple transformations, or when a set of transformation instructions can be used repeatedly. By supplying a Source to a TransformerFactory's newTemplates( ) method, you get an instance of the Templates object:

// Get a factory
TransformerFactory factory = TransformerFactory.newInstance( );

// Get a Templates object
Templates template = factory.newTemplates(new StreamSource("html.xsl"));

At this point, the template object would be a compiled representation of the transformation detailed in html.xsl (in this example, a stylesheet that converts XML to HTML). By using a Templates object, transformations can be performed from this template across threads, and you also get some optimizations, because instructions are precompiled. Once you have gone that far, you need to generate a Transformer, but from the Templates object, rather than the factory:

// Get a transformer
Transformer transformer = template.newTransformer( );

// Transform
transformer.transform(new DOMSource(orderForm), 
                      new StreamResult(res.getOutputStream( )));

Here, there is no need to supply a Source to the newTransformer( ) method, as the transformer is simply a set of (already) compiled instructions. From there, it's business as usual. In this example, a DOM tree that represents an order form is supplied to the transformation, processed using the html.xsl stylesheet, and then sent to a servlet's output stream for display. Pretty slick, huh? As a general rule, if you are going to use a stylesheet more than twice, use a Templates object; it will pay off in performance. Additionally, anytime you are dealing with threads, Templates are the only way to go.

9.3. JAXP 1.1

9.3.1. Updating the Standards

9.3.1.1. The road to SAX 2.0

Example 9-5. The parse( ) methods of the SAXParser interface