Introduction to JAXP 1.1 (Java and XSLT)

5.2. Introduction to JAXP 1.1

First released in March 2000, Sun's JAXP 1.0 utilized XML 1.0, XML Namespaces 1.0, SAX 1.0, and DOM Level 1. JAXP is a standard extension to Java, meaning that Sun provides a specification through its Java Community Process (JCP) as well as a reference implementation. JAXP 1.1 follows the same basic design philosophies of JAXP 1.0, adding support for DOM Level 2, SAX 2, and XSLT 1.0. A tool like JAXP is necessary because the XSLT specification defines only a transformation language; it says nothing about how to write a Java XSLT processor. Although they all perform the same basic tasks, every processor uses a different API and has its own set of programming conventions.

JAXP is not an XML parser, nor is it an XSLT processor. Instead, it provides a common Java interface that masks differences between various implementations of the supported standards. When using JAXP, your code can avoid dependencies on specific vendor tools, allowing flexibility to upgrade to newer tools when they become available.

Figure 5-1. JAXP 1.1 architecture

As shown, application code does not deal directly with specific parser or processor implementations, such as SAXON or Xalan. Instead, you write code against abstract classes that JAXP provides. This level of indirection allows you to pick and choose among different implementations without even recompiling your application.

The main drawback to an API such as JAXP is the "least common denominator" effect, which is all too familiar to AWT programmers. In order to maximize portability, JAXP mostly provides functionality that all XSLT processors support. This means, for instance, that Xalan's custom XPath APIs are not included in JAXP. In order to use value-added features of a particular processor, you must revert to nonportable code, negating the benefits of a plugability layer. Fortunately, most common tasks are supported by JAXP, so reverting to implementation-specific code is the exception, not the rule.

package chap5; import java.io.*; /** * A simple demo of JAXP 1.1 */ public class SimpleJaxp { /** * Accept two command line arguments: the name of an XML file, and * the name of an XSLT stylesheet. The result of the transformation * is written to stdout. */ public static void main(String[] args) throws javax.xml.transform.TransformerException { if (args.length != 2) { System.err.println("Usage:"); System.err.println(" java " + SimpleJaxp.class.getName( ) + " xmlFileName xsltFileName"); System.exit(1); } File xmlFile = new File(args[0]); File xsltFile = new File(args[1]); javax.xml.transform.Source xmlSource = new javax.xml.transform.stream.StreamSource(xmlFile); javax.xml.transform.Source xsltSource = new javax.xml.transform.stream.StreamSource(xsltFile); javax.xml.transform.Result result = new javax.xml.transform.stream.StreamResult(System.out); // create an instance of TransformerFactory javax.xml.transform.TransformerFactory transFact = javax.xml.transform.TransformerFactory.newInstance( ); javax.xml.transform.Transformer trans = transFact.newTransformer(xsltSource); trans.transform(xmlSource, result); } }

File xmlFile = new File(args[0]); File xsltFile = new File(args[1]); javax.xml.transform.Source xmlSource = new javax.xml.transform.stream.StreamSource(xmlFile); javax.xml.transform.Source xsltSource = new javax.xml.transform.stream.StreamSource(xsltFile);

5.2.3. The Transformer Class

As shown in Example 5-3, a Transformer object can be obtained from the TransformerFactory as follows:

javax.xml.transform.TransformerFactory transFact =
        javax.xml.transform.TransformerFactory.newInstance( );
javax.xml.transform.Transformer trans =
        transFact.newTransformer(xsltSource);

The Transformer instance is wrapped around an XSLT stylesheet and allows you to perform as many transformations as you wish. The main caveat is thread safety, because many threads cannot use a single Transformer instance concurrently. For each transformation, invoke the transform method:

abstract void transform(Source xmlSource, Result outputTarget)
    throws TransformerException

This method is abstract because the TransformerFactory returns a subclass of Transformer that does the actual work. The Source interface defines where the XML data comes from and the Result interface specifies where the transformation result is sent. The TransformerException will be thrown if anything goes wrong during the transformation process and may contain the location of the error and a reference to the original exception. The ability to properly report the location of the error is entirely dependent upon the quality of the underlying XSLT transformer implementation's error reporting. We will talk about specific classes that implement the Source and Result interfaces later in this chapter.

Aside from actually performing the transformation, the Transformer implementation allows you to set output properties and stylesheet parameters. In XSLT, a stylesheet parameter is declared and used as follows:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>
  <xsl:param name="image_dir" select="'images'"/>
  
  <xsl:template match="/">
    <html>
      <body>
        <h1>Stylesheet Parameter Example</h1>
        <img src="{$image_dir}/sample.gif"/>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>

The <xsl:param> element declares the parameter name and an optional select attribute. This attribute specifies the default value if the stylesheet parameter is not provided. In this case, the string 'images' is the default value and is enclosed in apostrophes so it is treated as a string instead of an XPath expression. Later, the image_dir variable is referred to with the attribute value template syntax: {$image_dir}.

Passing a variable for the location of your images is a common technique because your development environment might use a different directory name than your production web server. Another common use for a stylesheet parameter is to pass in data that a servlet generates dynamically, such as a unique ID for session tracking.

From JAXP, pass this parameter via the Transformer instance. The code is simple enough:

javax.xml.transform.Transformer trans =
        transFact.newTransformer(xsltSource);
trans.setParameter("image_dir", "graphics");

You can set as many parameters as you like, and these parameters will be saved and reused for every transformation you make with this Transformer instance. If you wish to remove a parameter, you must call clearParameters( ), which clears all parameters for this Transformer instance. Parameters work similarly to a java.util.Map; if you set the same parameter twice, the second value overwrites the first value.

Another use for the Transformer class is to get and set output properties through one of the following methods:

void setOutputProperties(java.util.Properties props)
void setOutputProperty(String name, String value)
java.util.Properties getOutputProperties( )
String getOutputProperty(String name)

As you can see, properties are specified as name/value pairs of Strings and can be set and retrieved individually or as a group. Unlike stylesheet parameters, you can un-set an individual property by simply passing in null for the value. The permitted property names are defined in the javax.xml.transform.OutputKeys class and are explained in Table 5-1.

Table 5-1. Constants defined in javax.xml.transform.OutputKeys

Constant	Meaning
CDATA_SECTION_ELEMENTS	Specifies a whitespace-separated list of element names whose content should be output as CDATA sections. See the XSLT specification from the W3C for examples.
DOCTYPE_PUBLIC	Only used if `DOCTYPE_SYSTEM` is also used, this instructs the processor to output a PUBLIC document type declaration. For example: `<!DOCTYPE rootElem PUBLIC` `"public id" "system id">`.
DOCTYPE_SYSTEM	Instructs the processor to output a document-type declaration. For example: `<!DOCTYPE rootElem SYSTEM` `"system id">`.
ENCODING	Specifies the character encoding of the result tree, such as UTF-8 or UTF-16.
INDENT	Specifies whether or not whitespace may be added to the result tree, making the output more readable. Acceptable values are `yes` and `no`. Although indentation makes the output more readable, it does make the file size larger, thus harming performance.
MEDIA_TYPE	The MIME type of the result tree.
METHOD	The output method, either `xml`, `html`, or `text`. Although other values are possible, such as `xhtml`, these are implementation-defined and may be rejected by your processor.
OMIT_XML_DECLARATION	Acceptable values are `yes` and `no`, specifying whether or not to include the XML declaration on the first line of the result tree.
STANDALONE	Acceptable values are `yes` and `no`, specifying whether or not the XML declaration indicates that the document is standalone. For example: `<?xml version="1.0"` `encoding="UTF-8" standalone="yes"?>`.
VERSION	Specifies the version of the output method, typically `1.0` for XML output. This shows up in the XML declaration as follows: `<?xml version="1.0" encoding="UTF-8"?>`.

It is no coincidence that these output properties are the same as the properties you can set on the <xsl:output> element in your stylesheets. For example:

<xsl:output method="xml" indent="yes" encoding="UTF-8"/>

Using JAXP, you can either specify additional output properties or override those set in the stylesheet. To change the encoding, write this code:

// this will take precedence over any encoding specified in the stylesheet
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-16");

Keep in mind that this will, in addition to adding encoding="UTF-16" to the XML declaration, actually cause the processor to use that encoding in the result tree. For a value of UTF-16, this means that 16-bit Unicode characters will be generated, so you may have trouble viewing the result tree in many ASCII-only text editors.

Package

Description

javax.xml.transform

Defines a general-purpose API for XML transformations without any dependencies on SAX or DOM. The Transformer class is obtained from the TransformerFactory class. The Transformer transforms from a Source to a Result.

javax.xml.transform.dom

Defines how transformations can be performed using DOM. Provides implementations of Source and Result: DOMSource and DOMResult.

javax.xml.transform.sax

Supports SAX2 transformations. Defines SAX versions of Source and Result: SAXSource and SAXResult. Also defines a subclass of TransformerFactory that allows SAX2 events to be fed into an XSLT processor.

javax.xml.transform.stream

Defines I/O stream implementations of Source and Result: StreamSource and StreamResult.

5.2. Introduction to JAXP 1.1

Figure 5-1. JAXP 1.1 architecture

5.2.1. JAXP 1.1 Implementation

Example 5-3. SimpleJaxp.java

5.2.2. XSLT Plugability Layer

5.2.3. The Transformer Class

Table 5-1. Constants defined in javax.xml.transform.OutputKeys

5.2.4. JAXP XSLT Design

Table 5-2. JAXP transformation packages

Figure 5-2. javax.xml.transform class diagram