XSLT Processing with Java (Java and XSLT)

Since many of the XSLT processors are written in Java, they can be directly invoked from a Java application or servlet. Embedding the processor into a Java application is generally a matter of including one or two JAR files on the CLASSPATH and then invoking the appropriate methods. This chapter shows how to do this, along with a whole host of other programming techniques.

When invoked from the command line, an XSLT processor such as Xalan expects the location of an XML file and an XSLT stylesheet to be passed as parameters. The two files are then parsed into memory using an XML parser such as Xerces or Crimson, and the transformation is performed. But when the XSLT processor is invoked programmatically, you are not limited to using static files. Instead, you can send a precompiled stylesheet and a dynamically generated DOM tree directly to the processor, or even fire SAX events as processor input. A major goal is to eliminate the overhead of parsing, which can dramatically improve performance.

This chapter is devoted to Java and XSLT programming techniques that work for both standalone applications as well as servlets, with a particular emphasis on Sun's Java API for XML Processing (JAXP) API. In Chapter 6, "Servlet Basics and XSLT", we will apply these techniques to servlets, taking into account issues such as concurrency, deployment, and performance.

5.1. A Simple Example

Let's start with perhaps the simplest program that can be written. For this task, we will write a simple Java program that transforms a static XML data file into HTML using an XSLT stylesheet. The key benefit of beginning with a simple program is that it isolates problems with your development environment, particularly CLASSPATH issues, before you move on to more complex tasks.

Two versions of our Java program will be written, one for Xalan and another for SAXON. A JAXP implementation will follow in the next section, showing how the same code can be utilized for many different processors.

CLASSPATH Problems

CLASSPATH problems are a common culprit when your code is not working, particularly with XML-related APIs. Since so many tools now use XML, it is very likely that a few different DOM and SAX implementations reside on your system. Before trying any of the examples in this chapter, you may want to verify that older parsers are not listed on your CLASSPATH.

More subtle problems can occur if an older library resides in the Java 2 optional packages directory. Any JAR file found in the jre/lib/ext directory is automatically available to the JVM without being added to the CLASSPATH. You should look for files such as jaxp.jar and parser.jar, which could contain older, incompatible XML APIs. If you experience problems, remove all JAR files from the optional packages directory.

Unfortunately, you will have to do some detective work to figure out where the JAR files came from. Although Java 2 Version 1.3 introduced enhanced JAR features that included versioning information, most of the JAR files you encounter probably will not utilize this capability.

5.1.1. The Design

The design of this application is pretty simple. A single class contains a main( ) method that performs the transformation. The application requires two arguments: the XML file name followed by the XSLT file name. The results of the transformation are simply written to System.out. We will use the following XML data for our example:

<?xml version="1.0" encoding="UTF-8"?>
<message>Yep, it worked!</message>

The following XSLT stylesheet will be used. It's output method is text, and it simply prints out the contents of the <message> element. In this case, the text will be Yep, it worked!.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" encoding="UTF-8"/>

  <!-- simply copy the message to the result tree -->
  <xsl:template match="/">
    <xsl:value-of select="message"/>
  </xsl:template>
</xsl:stylesheet>

Since the filenames are passed as command-line parameters, the application can be used with other XML and XSLT files. You might want to try this out with one of the president examples from Chapter 2, "XSLT Part 1 -- The Basics" and 3.

5.1.2. Xalan 1 Implementation

The complete code for the Xalan implementation is listed in Example 5-1. As comments in the code indicate, this code was developed and tested using Xalan 1.2.2, which is not the most recent XSLT processor from Apache. Fully qualified Java class names, such as org.apache.xalan.xslt.XSLTProcessor, are used for all Xalan-specific code.

NOTE: A Xalan 2 example is not shown here because Xalan 2 is compatible with Sun's JAXP. The JAXP version of this program works with Xalan 2, as well as any other JAXP compatible processor.

Example 5-1. SimpleXalan1.java

package chap5;

import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import org.xml.sax.SAXException;


/**
 * A simple demo of Xalan 1. This code was originally written using
 * Xalan 1.2.2.  It will not work with Xalan 2.
 */
public class SimpleXalan1 {

    /**
     * Accept two command line arguments: the name of an XML file, and
     * the name of an XSLT stylesheet. The result of the transformation
     * is written to stdout.
     */
    public static void main(String[] args)
            throws MalformedURLException, SAXException {
        if (args.length != 2) {
            System.err.println("Usage:");
            System.err.println("  java " + SimpleXalan1.class.getName( )
                    + " xmlFileName xsltFileName");
            System.exit(1);
        }

        String xmlFileName = args[0];
        String xsltFileName = args[1];

        String xmlSystemId = new File(xmlFileName).toURL().toExternalForm( );
        String xsltSystemId = new File(xsltFileName).toURL().toExternalForm( );

        org.apache.xalan.xslt.XSLTProcessor processor =
                org.apache.xalan.xslt.XSLTProcessorFactory.getProcessor( );

        org.apache.xalan.xslt.XSLTInputSource xmlInputSource =
                new org.apache.xalan.xslt.XSLTInputSource(xmlSystemId);

        org.apache.xalan.xslt.XSLTInputSource xsltInputSource =
                new org.apache.xalan.xslt.XSLTInputSource(xsltSystemId);

        org.apache.xalan.xslt.XSLTResultTarget resultTree =
                new org.apache.xalan.xslt.XSLTResultTarget(System.out);

        processor.process(xmlInputSource, xsltInputSource, resultTree);
    }
}

The code begins with the usual list of imports and the class declaration, followed by a simple check to ensure that two command line arguments are provided. If all is OK, then the XML file name and XSLT file name are converted into system identifier values:

String xmlSystemId = new File(xmlFileName).toURL().toExternalForm( );
String xsltSystemId = new File(xsltFileName).toURL().toExternalForm( );

System identifiers are part of the XML specification and really mean the same thing as a Uniform Resource Identifier (URI). A Uniform Resource Locator (URL) is a specific type of URI and can be used for methods that require system identifiers as parameters. From a Java programming perspective, this means that a platform-specific filename such as C:/data/simple.xml needs to be converted to file:///C:/data/simple.xml before it can be used by most XML APIs. The code shown here does the conversion and will work on Unix, Windows, and other platforms supported by Java. Although you could try to manually prepend the filename with the literal string file:///, that may not result in portable code. The documentation for java.io.File clearly states that its toURL( ) method generates a system-dependent URL, so the results will vary when the same code is executed on a non-Windows platform. In fact, on Windows the code actually produces a nonstandard URL (with a single slash), although it does work within Java programs: file:/C:/data/simple.xml.

Now that we have system identifiers for our two input files, an instance of the XSLT processor is created:

org.apache.xalan.xslt.XSLTProcessor processor =
        org.apache.xalan.xslt.XSLTProcessorFactory.getProcessor( );

XSLTProcessor is an interface, and XSLTProcessorFactory is a factory for creating new instances of classes that implement it. Because Xalan is open source software, it is easy enough to determine that XSLTEngineImpl is the class that implements the XSLTProcessor interface, although you should try to avoid code that depends on the specific implementation.

The next few lines of code create XSLTInputSource objects, one for the XML file and another for the XSLT file:

org.apache.xalan.xslt.XSLTInputSource xmlInputSource =
        new org.apache.xalan.xslt.XSLTInputSource(xmlSystemId);

org.apache.xalan.xslt.XSLTInputSource xsltInputSource =
        new org.apache.xalan.xslt.XSLTInputSource(xsltSystemId);

XSLTInputSource is a subclass of org.xml.sax.InputSource, adding the ability to read directly from a DOM Node. XSLTInputSource has the ability to read XML or XSLT data from a system ID, java.io.InputStream, java.io.Reader, org.w3c.dom.Node, or an existing InputSource. As shown in the code, the source of the data is specified in the constructor. XSLTInputSource also has a no-arg constructor, along with get/set methods for each of the supported data source types.

An instance of XSLTResultTarget is created next, sending the result of the transformation to System.out:

org.apache.xalan.xslt.XSLTResultTarget resultTree =
        new org.apache.xalan.xslt.XSLTResultTarget(System.out);

In a manner similar to XSLTInputSource, the XSLTResultTarget can also be wrapped around an instance of org.w3c.dom.Node, an OutputStream or Writer, a filename (not a system ID!), or an instance of org.xml.sax.DocumentHandler.

The final line of code simply instructs the processor to perform the transformation:

processor.process(xmlInputSource, xsltInputSource, resultTree);

5.1.3. SAXON Implementation

For comparison, a SAXON 5.5.1 implementation is presented in Example 5-2. As you scan through the code, you will notice the word "trax" appearing in the Java packages. This is an indication that Version 5.5.1 of SAXON was moving towards something called Transformation API for XML (TrAX). More information on TrAX is coming up in the JAXP discussion. In a nutshell, TrAX provides a uniform API that should work with any XSLT processor.

Example 5-2. SimpleSaxon.java

package chap5;

import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import org.xml.sax.SAXException;

/**
 * A simple demo of SAXON. This code was originally written using
 * SAXON 5.5.1.
 */
public class SimpleSaxon {

    /**
     * Accept two command line arguments: the name of an XML file, and
     * the name of an XSLT stylesheet. The result of the transformation
     * is written to stdout.
     */
    public static void main(String[] args)
            throws MalformedURLException, IOException, SAXException {
        if (args.length != 2) {
            System.err.println("Usage:");
            System.err.println("  java " + SimpleSaxon.class.getName( )
                    + " xmlFileName xsltFileName");
            System.exit(1);
        }

        String xmlFileName = args[0];
        String xsltFileName = args[1];

        String xmlSystemId = new File(xmlFileName).toURL().toExternalForm( );
        String xsltSystemId = new File(xsltFileName).toURL().toExternalForm( );

        com.icl.saxon.trax.Processor processor =
                com.icl.saxon.trax.Processor.newInstance("xslt");

        // unlike Xalan, SAXON uses the SAX InputSource.  Xalan
        // uses its own class, XSLTInputSource
        org.xml.sax.InputSource xmlInputSource =
                new org.xml.sax.InputSource(xmlSystemId);
        org.xml.sax.InputSource xsltInputSource =
                new org.xml.sax.InputSource(xsltSystemId);

        com.icl.saxon.trax.Result result =
                new com.icl.saxon.trax.Result(System.out);

        // create a new compiled stylesheet
        com.icl.saxon.trax.Templates templates =
                processor.process(xsltInputSource);

        // create a transformer that can be used for a single transformation
        com.icl.saxon.trax.Transformer trans = templates.newTransformer( );
        trans.transform(xmlInputSource, result);
    }
}

The SAXON implementation starts exactly as the Xalan implementation does. Following the class declaration, the command-line parameters are validated and then converted to system IDs. The XML and XSLT system IDs are then wrapped in org.xml.sax.InputSource objects as follows:

org.xml.sax.InputSource xmlInputSource =
        new org.xml.sax.InputSource(xmlSystemId);
org.xml.sax.InputSource xsltInputSource =
        new org.xml.sax.InputSource(xsltSystemId);

This code is virtually indistinguishable from the Xalan code, except Xalan uses XSLTInputSource instead of InputSource. As mentioned before, XSLTInputSource is merely a subclass of InputSource that adds support for reading from a DOM Node. SAXON also has the ability to read from a DOM node, although its approach is slightly different.

Creating a Result object sets up the destination for the XSLT result tree, which is directed to System.out in this example:

com.icl.saxon.trax.Result result =
        new com.icl.saxon.trax.Result(System.out);

The XSLT stylesheet is then compiled, resulting in an object that can be used repeatedly from many concurrent threads:

com.icl.saxon.trax.Templates templates =
        processor.process(xsltInputSource);

In a typical XML and XSLT web site, the XML data is generated dynamically, but the same stylesheets are used repeatedly. For instance, stylesheets generating common headers, footers, and navigation bars will be used by many pages. To maximize performance, you will want to process the stylesheets once and reuse the instances for many clients at the same time. For this reason, the thread safety that Templates offers is critical.

An instance of the Transformer class is then created to perform the actual transformation. Unlike the stylesheet itself, the transformer cannot be shared by many clients and is not thread-safe. If this was a servlet implementation, the Transformer instance would have to be created with each invocation of doGet or doPost. In our example, the code is as follows:

com.icl.saxon.trax.Transformer trans = templates.newTransformer( );
trans.transform(xmlInputSource, result);

5.1.4. SAXON, Xalan, or TrAX?

As the previous examples show, SAXON and Xalan have many similarities. While similarities make learning the various APIs easy, they do not result in portable code. If you write code directly against either of these interfaces, you lock yourself into that particular implementation unless you want to rewrite your application.

The other option is to write a facade around both processors, presenting a consistent interface that works with either processor behind the scenes. The only problem with this approach is that as new processors are introduced, you must update the implementation of your facade. It would be very difficult for one individual or organization to keep up with the rapidly changing world of XSLT processors.

But if the facade was an open standard and supported by a large enough user base, the people and organizations that write the XSLT processors would feel pressure to adhere to the common API, rather than the other way around. TrAX was initiated in early 2000 as an effort to define a consistent API to any XSLT processor. Since some of the key people behind TrAX were also responsible for implementing some of the major XSLT processors, it was quickly accepted that TrAX would be a de facto standard, much in the way that SAX is.

Chapter 5. XSLT Processing with Java

Contents: