5.2. Introduction to JAXP 1.1
TrAX was a great idea, and the original
work and concepts behind it were absorbed into JAXP Version 1.1. If
you search for TrAX on the Web and get the feeling that the effort is
waning, this is only because focus has shifted from TrAX to JAXP.
Although the name has changed, the concept has not: JAXP provides a
standard Java interface to many XSLT processors, allowing you to
choose your favorite underlying implementation while retaining
portability.
First released in March 2000, Sun's JAXP 1.0 utilized XML 1.0,
XML Namespaces 1.0, SAX 1.0, and DOM Level 1. JAXP is a standard
extension to Java, meaning that Sun provides a specification through
its Java Community Process (JCP) as well as a reference
implementation. JAXP 1.1 follows the same basic design philosophies
of JAXP 1.0, adding support for DOM Level 2, SAX 2, and XSLT 1.0. A
tool like JAXP is necessary because the XSLT specification defines
only a transformation language; it says nothing about how to write a
Java XSLT processor. Although they all perform the same basic tasks,
every processor uses a different API and has its own set of
programming conventions.
JAXP is not an XML parser, nor is it an XSLT processor. Instead, it
provides a common Java interface that masks differences between
various implementations of the supported standards. When using JAXP,
your code can avoid dependencies on specific vendor tools, allowing
flexibility to upgrade to newer tools when they become available.
The key to JAXP's design is the concept of
plugability
layers. These layers provide consistent Java
interfaces to the underlying SAX, DOM, and XSLT implementations. In
order to utilize one of these APIs, you must obtain a factory class
without hardcoding Xalan or SAXON code into your application. This is
accomplished via a lookup mechanism that relies on Java system
properties. Since three separate plugability layers are used, you can
use a DOM parser from one vendor, a SAX parser from another vendor,
and yet another XSLT processor from someone else. In reality, you
will probably need to use a DOM parser compatible with your XSLT
processor if you try to transform the DOM tree directly. Figure 5-1 illustrates the high-level architecture of
JAXP 1.1.
Figure 5-1. JAXP 1.1 architecture
As shown, application code does not deal directly with specific
parser or processor implementations, such as SAXON or Xalan. Instead,
you write code against abstract classes that JAXP provides. This
level of indirection allows you to pick and choose among different
implementations without even recompiling your application.
The main drawback to an API such as JAXP is the "least common
denominator" effect, which is all too familiar to AWT
programmers. In order to maximize portability, JAXP mostly provides
functionality that all XSLT processors support. This means, for
instance, that Xalan's custom XPath APIs are not included in
JAXP. In order to use value-added features of a particular processor,
you must revert to nonportable code, negating the benefits of a
plugability layer. Fortunately, most common tasks are supported by
JAXP, so reverting to implementation-specific code is the exception,
not the rule.
Although the JAXP specification does not define an XML parser or XSLT
processor, reference implementations do include these tools. These
reference implementations are open source Apache XML tools,[18] so complete source code
is available.
5.2.1. JAXP 1.1 Implementation
You
guessed it -- we will now reimplement the simple example using
Sun's JAXP 1.1. Behind the scenes, this could use any JAXP
1.1-compliant XSLT processor; this code was developed and tested
using Apache's Xalan 2 processor. Example 5-3
contains the complete source code.
Example 5-3. SimpleJaxp.java
package chap5;
import java.io.*;
/**
* A simple demo of JAXP 1.1
*/
public class SimpleJaxp {
/**
* Accept two command line arguments: the name of an XML file, and
* the name of an XSLT stylesheet. The result of the transformation
* is written to stdout.
*/
public static void main(String[] args)
throws javax.xml.transform.TransformerException {
if (args.length != 2) {
System.err.println("Usage:");
System.err.println(" java " + SimpleJaxp.class.getName( )
+ " xmlFileName xsltFileName");
System.exit(1);
}
File xmlFile = new File(args[0]);
File xsltFile = new File(args[1]);
javax.xml.transform.Source xmlSource =
new javax.xml.transform.stream.StreamSource(xmlFile);
javax.xml.transform.Source xsltSource =
new javax.xml.transform.stream.StreamSource(xsltFile);
javax.xml.transform.Result result =
new javax.xml.transform.stream.StreamResult(System.out);
// create an instance of TransformerFactory
javax.xml.transform.TransformerFactory transFact =
javax.xml.transform.TransformerFactory.newInstance( );
javax.xml.transform.Transformer trans =
transFact.newTransformer(xsltSource);
trans.transform(xmlSource, result);
}
}
As in the earlier examples, explicit package names are used in the
code to point out which classes are parts of JAXP. In future
examples, import statements will be favored
because they result in less typing and more readable code. Our new
program begins by declaring that it may throw
TransformerException:
public static void main(String[] args)
throws javax.xml.transform.TransformerException {
This is a general-purpose exception representing anything that might
go wrong during the transformation process. In other processors,
SAX-specific exceptions are typically propagated to the caller. In
JAXP, TransformerException can be wrapped around
any type of Exception object that various XSLT
processors may throw.
Next, the command-line arguments are converted into
File objects. In the SAXON and Xalan examples, we
created a system ID for each of these files. Since JAXP can read
directly from a File object, the extra conversion
to a URI is not needed:
File xmlFile = new File(args[0]);
File xsltFile = new File(args[1]);
javax.xml.transform.Source xmlSource =
new javax.xml.transform.stream.StreamSource(xmlFile);
javax.xml.transform.Source xsltSource =
new javax.xml.transform.stream.StreamSource(xsltFile);
The Source interface is used to read both the
XML file and the XSLT file. Unlike the SAX
InputSource class or Xalan's
XSLTInputSource class, Source
is an interface that can have many implementations. In this simple
example we are using
StreamSource, which has the ability to read from a
File object, an InputStream, a
Reader, or a system ID. Later we will examine
additional Source implementations that use SAX and
DOM as input. Just like Source,
Result is an interface that can have several
implementations. In this example, a StreamResult
sends the output of the transformations to
System.out:
javax.xml.transform.Result result =
new javax.xml.transform.stream.StreamResult(System.out);
Next, an instance of
TransformerFactory is created:
javax.xml.transform.TransformerFactory transFact =
javax.xml.transform.TransformerFactory.newInstance( );
The TransformerFactory is responsible for creating
Transformer and Template
objects. In our simple example, we create a
Transformer object:
javax.xml.transform.Transformer trans =
transFact.newTransformer(xsltSource);
Transformer objects are not thread-safe, although
they can be used multiple times. For a simple example like this, we
will not encounter any problems. In a threaded servlet environment,
however, multiple users cannot concurrently access the same
Transformer instance. JAXP also provides a
Templates interface, which represents a
stylesheet that can be accessed by many concurrent threads.
The transformer instance is then used to perform the actual
transformation:
trans.transform(xmlSource, result);
This applies the XSLT stylesheet to the XML data, sending the result
to System.out.
5.2.2. XSLT Plugability Layer
JAXP 1.1 defines a specific lookup procedure to locate an appropriate
XSLT processor. This must be accomplished without hardcoding
vendor-specific code into applications, so Java system properties and
JAR file service providers are used. Within your code, first locate
an instance of the TransformerFactory class as
follows:
javax.xml.transform.TransformerFactory transFact =
javax.xml.transform.TransformerFactory.newInstance( );
Since TransformerFactory is abstract, its
newInstance( ) factory method is used to
instantiate an instance of a specific subclass. The algorithm for
locating this subclass begins by looking at the
javax.xml.transform.TransformerFactory system
property. Let us suppose that
com.foobar.AcmeTransformer is a new XSLT processor
compliant with JAXP 1.1. To utilize this processor instead of
JAXP's default processor, you can specify the system property
on the command line[19] when you start your
Java application:
java -Djavax.xml.transform.TransformerFactory=com.foobar.AcmeTransformer MyApp
Provided that JAXP is able to instantiate an instance of
AcmeTransformer, this is the XSLT processor that
will be used. Of course, AcmeTransformer must be a
subclass of TransformerFactory for this to work,
so it is up to vendors to offer support for JAXP.
If the system property is not specified, JAXP next looks for a
property file named lib/jaxp.properties in the
JRE directory. A property file consists of
name=value pairs, and JAXP looks for a line like
this:
javax.xml.transform.TransformerFactory=com.foobar.AcmeTransformer
You can obtain the location of the JRE with the following code:
String javaHomeDir = System.getProperty("java.home");
NOTE:
Some popular development tools change the value of
java.home when they are installed, which could
prevent JAXP from locating jaxp.properties.
JBuilder, for instance, installs its own version of Java 2 that it
uses by default.
The advantage of creating jaxp.properties in
this directory is that you can use your preferred processor for all
of your applications that use JAXP without having to specify the
system property on the command line. You can still override this file
with the -D command-line syntax, however.
If jaxp.properties is not found, JAXP uses the
JAR file service provider mechanism to locate an
appropriate subclass of TransformerFactory. The
service provider mechanism is outlined in the JAR file specification
from Sun and simply means that you must create a file in the
META-INF/services directory of a JAR file. In
JAXP, this file is called
javax.xml.transform.TransformerFactory. It
contains a single line that specifies the implementation of
TransformerFactory:
com.foobar.AcmeTransformer in our fictitious
example. If you look inside of xalan.jar in JAXP
1.1, you will find this file. In order to utilize a different parser
that follows the JAXP 1.1 convention, simply make sure its JAR file
is located first on your CLASSPATH.
Finally, if JAXP cannot find an implementation class from any of the
three locations, it uses its default implementation of
TransformerFactory. To summarize, here are the
steps that JAXP performs when attempting to
locate a factory:
-
Use the value of the
javax.xml.transform.TransformerFactory system
property if it exists.
-
If JRE/lib/jaxp.properties exists, then look for
a
javax.xml.transform.TransformerFactory=ImplementationClass
entry in that file.
-
Use a JAR file service provider to look for a file called
META-INF/services/javax.xml.transform.TransformerFactory
in any JAR file on the CLASSPATH.
-
Use the default TransformerFactory instance.
The JAXP 1.1 plugability layers for SAX and DOM follow the exact same
process as the XSLT layer, only they use the
javax.xml.parsers.SAXParserFactory and
javax.xml.parsers.DocumentBuilderFactory system
properties respectively. It should be noted that JAXP 1.0 uses a much
simpler algorithm where it checks only for the existence of the
system property. If that property is not set, the default
implementation is used.
5.2.3. The Transformer Class
As shown
in Example 5-3, a Transformer
object can be obtained from the TransformerFactory
as follows:
javax.xml.transform.TransformerFactory transFact =
javax.xml.transform.TransformerFactory.newInstance( );
javax.xml.transform.Transformer trans =
transFact.newTransformer(xsltSource);
The Transformer
instance is wrapped around an XSLT stylesheet and allows you to
perform as many transformations as you wish. The main caveat is
thread safety, because many threads cannot use a single
Transformer instance concurrently. For each
transformation, invoke the transform method:
abstract void transform(Source xmlSource, Result outputTarget)
throws TransformerException
This method is abstract because the
TransformerFactory returns a subclass of
Transformer that does the actual work. The
Source interface defines where the XML data comes
from and the Result interface specifies where the
transformation result is sent. The
TransformerException will be thrown if anything
goes wrong during the transformation process and may contain the
location of the error and a reference to the original exception. The
ability to properly report the location of the error is entirely
dependent upon the quality of the underlying XSLT transformer
implementation's error reporting. We will talk about specific
classes that implement the Source and
Result interfaces later in this chapter.
Aside from actually performing the transformation, the
Transformer implementation allows you to set
output properties and stylesheet parameters. In
XSLT, a stylesheet parameter is
declared and used as follows:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:param name="image_dir" select="'images'"/>
<xsl:template match="/">
<html>
<body>
<h1>Stylesheet Parameter Example</h1>
<img src="{$image_dir}/sample.gif"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The <xsl:param>
element declares
the parameter name and an optional select
attribute. This attribute specifies the default value if the
stylesheet parameter is not provided. In this case, the string
'images' is the default value and is enclosed in
apostrophes so it is treated as a string instead of an XPath
expression. Later, the image_dir variable is
referred to with the attribute value template syntax:
{$image_dir}.
Passing a variable for the location of your images is a common
technique because your development environment might use a different
directory name than your production web server. Another common use
for a stylesheet parameter is to pass in data that a servlet
generates dynamically, such as a unique ID for session tracking.
From JAXP, pass this parameter via the Transformer
instance. The code is simple enough:
javax.xml.transform.Transformer trans =
transFact.newTransformer(xsltSource);
trans.setParameter("image_dir", "graphics");
You can set as many parameters as you like, and these parameters will
be saved and reused for every transformation you make with this
Transformer instance. If you wish to remove a parameter, you must
call clearParameters( ), which clears all parameters for this
Transformer instance. Parameters work similarly to a java.util.Map;
if you set the same parameter twice, the second value overwrites the
first value.
Another use for the Transformer class is to get and set output
properties through one of the following methods:
void setOutputProperties(java.util.Properties props)
void setOutputProperty(String name, String value)
java.util.Properties getOutputProperties( )
String getOutputProperty(String name)
As you can see, properties are specified as name/value pairs of
Strings and can be set and retrieved individually or as a group.
Unlike stylesheet parameters, you can un-set an individual property
by simply passing in null for the value.
The permitted property names are
defined in the
javax.xml.transform.OutputKeys class and are explained in Table 5-1.
Table 5-1. Constants defined in javax.xml.transform.OutputKeys
Constant
|
Meaning
|
CDATA_SECTION_ELEMENTS
|
Specifies a whitespace-separated list of element names whose content
should be output as CDATA sections. See the XSLT specification from
the W3C for examples.
|
DOCTYPE_PUBLIC
|
Only used if DOCTYPE_SYSTEM is also used, this
instructs the processor to output a PUBLIC document type declaration.
For example: <!DOCTYPE rootElem PUBLIC
"public id" "system id">.
|
DOCTYPE_SYSTEM
|
Instructs the processor to output a document-type declaration. For
example: <!DOCTYPE rootElem SYSTEM
"system id">.
|
ENCODING
|
Specifies the character encoding of the result tree, such as UTF-8 or
UTF-16.
|
INDENT
|
Specifies whether or not whitespace may be added to the result tree,
making the output more readable. Acceptable values are
yes and no. Although
indentation makes the output more readable, it does make the file
size larger, thus harming performance.
|
MEDIA_TYPE
|
The MIME type of the result tree.
|
METHOD
|
The output method, either xml,
html, or text. Although other
values are possible, such as xhtml, these are
implementation-defined and may be rejected by your processor.
|
OMIT_XML_DECLARATION
|
Acceptable values are yes and
no, specifying whether or not to include the XML
declaration on the first line of the result tree.
|
STANDALONE
|
Acceptable values are yes and
no, specifying whether or not the XML declaration
indicates that the document is standalone. For example:
<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>.
|
VERSION
|
Specifies the version of the output method, typically
1.0 for XML output. This shows up in the XML
declaration as follows: <?xml version="1.0"
encoding="UTF-8"?>.
|
It is no coincidence that these output properties are the same as the
properties you can set on the <xsl:output>
element in your stylesheets. For example:
<xsl:output method="xml" indent="yes" encoding="UTF-8"/>
Using JAXP, you can either specify additional output properties or
override those set in the stylesheet. To change the encoding, write
this code:
// this will take precedence over any encoding specified in the stylesheet
trans.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
Keep in mind that this will, in addition to adding
encoding="UTF-16" to the XML declaration, actually
cause the processor to use that encoding in the result tree. For a
value of UTF-16, this means that 16-bit Unicode
characters will be generated, so you may have trouble viewing the
result tree in many ASCII-only text editors.
5.2.4. JAXP XSLT Design
Now that we have seen some example code and have begun our
exploration of the Transformer class, let's
step back and look at the overall design of the XSLT plugability
layer. JAXP support for XSLT is broken down into the packages listed
in Table 5-2.
Table 5-2. JAXP transformation packages
Package
|
Description
|
javax.xml.transform
|
Defines a general-purpose API for XML transformations without any
dependencies on SAX or DOM. The Transformer class
is obtained from the TransformerFactory class. The
Transformer transforms from a
Source to a Result.
|
javax.xml.transform.dom
|
Defines how transformations can be performed using DOM. Provides
implementations of Source and
Result: DOMSource and
DOMResult.
|
javax.xml.transform.sax
|
Supports SAX2 transformations. Defines SAX versions of
Source and Result:
SAXSource and SAXResult. Also
defines a subclass of TransformerFactory that
allows SAX2 events to be fed into an XSLT processor.
|
javax.xml.transform.stream
|
Defines I/O stream implementations of Source and
Result: StreamSource and
StreamResult.
|
The heart of JAXP XSLT support lies in the
javax.xml.transform package, which lays out the
mechanics and overall process for any transformation that is
performed. This package mostly consists of interfaces and abstract
classes, except for OutputKeys and a few exception
and error classes. Figure 5-2 presents a UML class
diagram that shows all of the pieces in this important package.
Figure 5-2. javax.xml.transform class diagram
As you can see, this is a small package, indicative of the fact that
JAXP is merely a wrapper around the tools that actually perform
transformations. The entry point is
TransformerFactory, which creates instances of
Transformer, as we have already seen, as well as
instances of the Templates abstract class. A
Templates object represents a compiled stylesheet
and will be covered in detail later in this chapter.[20] The advantage of compilation
is performance: the same Templates object can be
used over and over by many threads without reparsing the XSLT file.
The URIResolver is responsible for resolving URIs
found within stylesheets and is generally something you will not need
to deal with directly. It is used when a stylesheet imports or
includes another document, and the processor needs to figure out
where to look for that document. For example:
<xsl:import href="commonFooter.xslt"/>
ErrorListener, as you may guess, is an interface
that allows your code to register as a listener for error conditions.
This interface defines the following three methods:
void error(TransformerException ex)
void fatalError(TransformerException ex)
void warning(TransformerException ex)
The TransformerException has the ability to wrap
around another Exception or
Throwable object and may return an instance of the
SourceLocator class. If the underlying XSLT
implementation does not provide a SourceLocator,
null is returned. The
SourceLocator interface defines methods to locate
where a TransformerException originated. In the
case of error() and warning(),
the XSLT processor is required to continue processing the document
until the end. For fatalError(), on the other
hand, the XSLT processor is not required to continue. If you do not
register an ErrorListener object, then all errors,
fatal errors, and warnings are normally written to
System.err. TransformerFactoryConfigurationError
and
TransformerConfigurationException
round out the error-handling APIs for JAXP, indicating problems
configuring the underlying XSLT processor implementation. The
TransformerFactoryConfigurationError class is
generally used when the implementation class cannot be found on the
CLASSPATH or cannot be instantiated at all.
TransformerConfigurationException simply indicates
a "serious configuration error" according to its
documentation.
 |  |  | 5. XSLT Processing with Java |  | 5.3. Input and Output |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|