XSLT Part 1 -- The Basics (Java and XSLT)

Extensible Stylesheet Language (XSL) is a specification from the World Wide Web Consortium (W3C) and is broken down into two complementary technologies: XSL Formatting Objects and XSL Transformations (XSLT). XSL Formatting Objects, a language for defining formatting such as fonts and page layout, is not covered in this book. XSLT, on the other hand, was primarily designed to transform a well-formed XML document into XSL Formatting Objects.

Even though XSLT was designed to support XSL Formatting Objects, it has emerged as the preferred technology for all sorts of transformations. Transformation from XML to HTML is the most common, but XSLT can also be used to transform well-formed XML into just about any text file format. This will give XML- and XSLT-based web sites a major leg up as wireless devices become more prevalent because XSLT can also be used to transform XML into Wireless Markup Language or some other stripped-down format that wireless devices will require.

2.1. XSLT Introduction

Why is transformation so important? XML provides a simple syntax for defining markup, but it is up to individuals and organizations to define specific markup languages. There is no guarantee that two organizations will use the exact same markup; in fact, you may struggle to agree on consistent formats within the same group or company. One group may use <employee>, while others may use <worker> or <associate>. In order to share data, the XML data has to be transformed into a common format. This is where XSLT shines -- it eliminates the need to write custom computer programs to transform data. Instead, you simply create one or more XSLT stylesheets.

An XSLT processor is an application that applies an XSLT stylesheet to an XML data source. Instead of modifying the original XML data, the result of the transformation is copied into something called a result tree, which can be directed to a static file, sent directly to an output stream, or even piped into another XSLT processor for further transformations. Figure 2-1 illustrates the transformation process, showing how the XML input, XSLT stylesheet, XSLT processor, and result tree relate to one another.

Figure 2-1. XSLT transformation

The XML input and XSLT stylesheet are normally two separate entities.[5] For the examples in this chapter, the XML will always reside in a text file. In future chapters, however, we will see how to improve performance by dealing with the XML as an in-memory object tree. This makes sense from a Java/XSLT perspective because most web applications will generate XML dynamically rather than deal with a series of static files. Since the XML data and XSLT stylesheet are clearly separated, it is very plausible to write several different stylesheets that convert the same XML into radically different formats.

[5] Section 2.7 of the XSLT specification covers embedded stylesheets.

XSLT transformation can occur on either the client or server, although server-side transformations are currently dominant. Since a vast majority of Internet users do not use XSLT-compliant browsers (at the time of this writing), the typical model is to transform XML into HTML on the web server so the browser sees only the resulting HTML. In a closed corporate environment where the browser feature set can be controlled, moving the XSLT transformation process to the browser can improve scalability and reduce network traffic.

It should be noted that XSLT stylesheets do not perform the same function as Cascading Style Sheets (CSS), which you may be familiar with. In the CSS model, style elements are applied to HTML or XML on the web browser, affecting formatting such as fonts and colors. CSS do not produce a separate result tree and cannot be applied in advance using a standalone processor as XSLT can. The CSS processing model operates on the underlying data in a top down fashion in a single pass, while XSLT can iterate and perform conditional logic on the XML data. Although XSLT can produce style instructions, its true role is that of a transformation language rather than a style language. XSL Formatting Objects, on the other hand, is a style language that is much more comparable to CSS.

For wireless applications, HTML is not typically generated. Instead, Wireless Markup Language (WML) is the current standard for cell phones and other wireless devices. In the future, new standards such as XHTML Basic may be used. When using an XSLT approach, the same XML data can be transformed into many forms, all via different stylesheets. Regardless of how many stylesheets are used, the XML data will remain unchanged. A typical web site might have the following stylesheets for a single XML home page:

homeBasic.xslt: For older web browsers

homeIE5.xslt: Takes advantage of newer Internet Explorer features

homeMozilla.xslt: Takes advantage of newer Netscape features

homeWML.xslt: Transforms into Wireless Markup Language

homeB2B.xslt: Transforms the XML into another XML format, suitable for "B2B-style" XML data feeds to customers

Schema evolution implies an upgrade to an existing data source where the structure of the data must be modified. When the data is stored in XML format, XSLT can be used to support schema evolution. For example, Version 1.0 of your application may store all of its files in XML format, but Version 2.0 might add new features that cannot be supported by the old 1.0 file format. A perfect solution is to write a single stylesheet to transform all of the old 1.0 XML files to the new 2.0 file format.

2.1.1. An XSLT Example

You need three components to perform XSLT transformations: an XML data source, an XSLT stylesheet, and an XSLT processor. The XSLT stylesheet is actually a well-formed XML document, so the XSLT processor will also include or use an XML parser. Apache's Xalan is used for most of the examples in this book; the previous chapter listed several other processors that you may want to investigate. You can download Xalan from http://xml.apache.org. It uses and includes Apache's Xerces parser, but can be configured to use other parsers. The ability to swap out parsers is important because this gives you the flexibility to use the latest innovations as competing (and perhaps faster) parsers are released.

Example 2-1 represents an early prototype of a discussion forum home page. The complete discussion forum application will be developed in Chapter 7, "Discussion Forum". This is the raw XML data, without any formatting instructions or HTML. As you can see, the home page simply lists the message boards that the user can choose to view.

Example 2-1. discussionForumHome.xml

<?xml version="1.0" encoding="UTF-8"?>
<discussionForumHome>
  <messageBoard id="1" name="Java Programming"/>
  <messageBoard id="2" name="XML Programming"/>
  <messageBoard id="3" name="XSLT Questions"/>
</discussionForumHome>

It is assumed that this data will be generated dynamically as the result of a database query, rather than hardcoded as a static XML file. Regardless of its origin, the XML data says nothing about how to actually display the web page. For clarity, we will keep the XSLT stylesheet fairly simple at this point. The beauty of an XML/XSLT approach is that you can beef up the stylesheet later on without compromising any of the underlying XML data structures. Even more importantly, the Java code that will generate the XML data does not have to be cluttered up with HTML and user interface logic; it just produces the basic XML data. Once the format of the data has been defined, a Java programmer can begin working on the database logic and XML generation code, while another team member begins writing the XSLT stylesheets.

Example 2-2 lists the XSLT stylesheet that produces the home page. Don't worry if not everything in this first example makes sense. XSLT is, after all, a completely new language. We will cover everything in detail throughout the remainder of this and the next chapter.

Example 2-2. discussionForumHome.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>

  <!-- match the document root -->
  <xsl:template match="/">
    <html>
      <head>
        <title>Discussion Forum Home Page</title>
      </head>
      <body>
        <h1>Discussion Forum Home Page</h1>
        <h3>Please select a message board to view:</h3>
        <ul>
          <xsl:apply-templates select="discussionForumHome/messageBoard"/>
        </ul>
      </body>
    </html>
  </xsl:template>

  <!-- match a <messageBoard> element -->
  <xsl:template match="messageBoard">
    <li>
      <a href="viewForum?id={@id}">
        <xsl:value-of select="@name"/>
      </a>
    </li>
  </xsl:template>
</xsl:stylesheet>

NOTE: The filename extension for XSLT stylesheets is irrelevant. In this book,.xslt is used. Many stylesheet authors prefer .xsl.

The first thing that should jump out immediately is the fact that the XSLT stylesheet is also a well-formed XML document. Do not let the xsl: namespace prefix fool you -- everything in this document adheres to the same basic rules that every other XML document must follow. Like other XML files, the first line of the stylesheet is an XML declaration:

<?xml version="1.0" encoding="UTF-8"?>

Unless you are dealing with internationalization issues, this will remain unchanged for every stylesheet you write. This line is immediately followed by the document root element, which contains the remainder of the stylesheet:

<xsl:stylesheet
    version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

The <xsl:stylesheet> element has two attributes in this case. The first, version="1.0", specifies the version of the XSLT specification. Although this is the current version at the time of this writing, the next version of the XSLT specification is well underway and may be finished by the time you read this. You can stay abreast of the latest XSLT developments by visiting the W3C home page at http://www.w3.org.

The next attribute declares the XML namespace, defining the meaning of the xsl: prefix you see on all of the XSLT elements. The prefix xsl is conventional, but could be anything you choose. This is useful if your document already uses the xsl prefix for other elements, and you do not want to introduce a naming conflict. This is really the entire point of namespaces: they help to avoid name conflicts. In XML, <a:book> and <b:book> can be discerned from one another because each book has a different namespace prefix. Since you pick the namespace prefix, this avoids the possibility that two vendors will use conflicting prefixes.

In the case of XSLT, the namespace prefix does not have to be xsl, but the value does have to be http://www.w3.org/1999/XSL/Transform. The value of a namespace is not necessarily a real web site, but the syntax is convenient because it helps ensure uniqueness. In the case of XSLT, 1999 represents the year that the URL was allocated for this purpose, and is not related to the version number. It is almost certain that future versions of XSLT will continue to use this same URL.

WARNING: Even the slightest typo in the namespace will render the stylesheet useless for most processors. The text must match http://www.w3.org/1999/XSL/Transform exactly, or your stylesheet will not be processed. Spelling or capitalization errors are a common mistake and should be the first thing you check when things are not working as you expect.

The next line of the stylesheet simply indicates that the result tree should be treated as an HTML document instead of an XML document:

<xsl:output method="html"/>

In Version 1.0 of XSLT, processors are not required to fully support this element. Xalan does, however, so we will include this in all of our stylesheets. Since the XSLT stylesheet itself must be written as well-formed XML, some HTML tags are difficult to include. Instead of writing <hr>, you must write <hr/> in your stylesheet. When the output method is html, processors such as Xalan will remove the slash (/) character from the result tree, which produces HTML that typical web browsers expect.

The remainder of our stylesheet consists of two templates . Each matches some pattern in the XML input document and is responsible for producing output to the result tree. The first template is repeated as follows:


<xsl:template match="/">
  <html>
    <head>
      <title>Discussion Forum Home Page</title>
    </head>
    <body>
      <h1>Discussion Forum Home Page</h1>
      <h3>Please select a message board to view:</h3>
      <ul>
        <xsl:apply-templates select="discussionForumHome/messageBoard"/>
      </ul>
    </body>
  </html>
</xsl:template>

When the XSLT processor begins its transformation process, it looks in your stylesheet for a template that matches the "/" pattern. This pattern matches the source XML document that is being transformed. You may recall from Chapter 1, "Introduction " that DOM uses the Document interface to represent the document, which is what we are matching here. This is always the starting point for processing, so nearly every stylesheet you write will contain a template similar to this one. Since this is the first template to be instantiated, it is also where we create the framework for the resulting HTML document. The second template, which matches the "messageBoard" pattern, is currently ignored. This is because the processor is only looking at the root of the XML document, and the <messageBoard> element is nested beneath the <discussionForumHome> element.

Most of the tags in this template do not start with <xsl:, so they are simply copied to the result tree. In fact, the only dynamic content in this particular template is the following line, which tells the processor to continue the transformation process:

<xsl:apply-templates select="discussionForumHome/messageBoard"/>

Without this line, the transformation process would be complete because the "/" pattern was already located and a corresponding template was instantiated. The <xsl:apply-templates> element tells the XSLT processor to begin a new search for elements in the source XML document that match the "discussionForumHome/messageBoard" pattern and to instantiate an additional template that matches. As we will see shortly, the transformation process is recursive and must be driven by XSLT elements such as <xsl:apply-templates>. Simply including one or more <xsl:template> elements in a stylesheet does not mean that they will be instantiated.

In this example, the <xsl:apply-templates> element tells the XSLT processor to first select all <discussionForumHome> elements of the current node. The current node is "/" , or the top of the document, so it only selects the <discussionForumHome> element that occurs at the document's root level. If another <discussionForumHome> element is deeply nested within the XML document, it will not be selected by this pattern. Assuming that the processor locates the <discussionForumHome> element, it then searches for all of its <messageBoard> children.

NOTE: The select attribute in <xsl:apply-templates> does not have to be the same as the match attribute in <xsl:template>. Although the stylesheet presented in Example 2-2 could have specified <xsl:template match="discussionForumHome/messageBoard"> for the second template, this would limit the reusability of the template. Specifically, it could only be applied to <messageBoard> elements that occur as direct children of <discussionForumHome> elements. Since our template matches only "messageBoard", it can be reused for <messageBoard> elements that appear anywhere in the XML document.

For each <messageBoard> child, the processor looks for the template in your stylesheet that provides the best match. Since our stylesheet contains a template that matches the "messageBoard" pattern exactly, it is instantiated for each of the <messageBoard> elements. The job of this template is to produce a single HTML list item tag for each <messageBoard> element:

<xsl:template match="messageBoard">
  <li>
    <a href="viewForum?id={@id}">
      <xsl:value-of select="@name"/>
    </a>
  </li>
</xsl:template>

As you can see, the list item must be properly terminated; HTML-style standalone <li> tags are not allowed because they break the requirement that XSLT stylesheets be well-formed XML. Terminating the element with </li> also works with HTML, so this is the approach you must take. The hyperlink is a best guess at this point in the design process because the servlet has not been defined yet. Later, when we develop a servlet to actually process this web page, we will update the link to point to the correct servlet.

In the stylesheet, @ is used to select the values of attributes. Curly braces ({}) are known as an attribute value template and will be discussed in Chapter 3, "XSLT Part 2 -- Beyond the Basics". If you look back at Example 2-1, you will see that each message board has two attributes, id and name:

<messageBoard id="1" name="Java Programming"/>

When the stylesheet processor is executed and the result tree generated, we end up with the HTML shown in Example 2-3. The HTML is minimal at this point, which is exactly what you want. Fancy changes to the page layout can be added later; the important concept is that programmers can get started right away with the underlying application logic because of the clean separation between data and presentation that XML and XSLT provide.

Example 2-3. discussionForumHome.html

<html>
  <head>
    <title>Discussion Forum Home Page</title>
  </head>
  <body>
    <h1>Discussion Forum Home Page</h1>
    <h3>Please select a message board to view:</h3>
    <ul>
      <li>
        <a href="viewForum?id=1">Java Programming</a>
      </li>
      <li>
        <a href="viewForum?id=2">XML Programming</a>
      </li>
      <li>
        <a href="viewForum?id=3">XSLT Questions</a>
      </li>
    </ul>
  </body>
</html>

2.1.2. Trying It Out

To try things out, download the examples for this book and locate discussionForumHome.xml and discussionForumHome.xslt. They can be found in the chap1 directory. If you would rather type in the examples, you can use any text editor or a dedicated XML editor such as Altova's XML Spy (http://www.xmlspy.com). After downloading and unzipping the Xalan distribution from Apache, simply add xalan.jar and erces.jar to your CLASSPATH. The transformation can then be initiated with the following command:

java org.apache.xalan.xslt.Process -IN discussionForumHome.xml -XSL 
discussionForumHome.xslt

This will apply the stylesheet, sending the resulting HTML content to standard output. Adding -OUTfilename to the command will cause Xalan to send the result tree directly to a file. To see the complete list of Xalan options, just type java org.apache.xalan.xslt.Process. For example, the -TT option allows you to see (trace) which templates are being called.

NOTE: Xalan's -IN and -XSL parameters accept URLs as arguments rather than as file names. A simple filename will work if the files are in the current working directory, but you may need to use a full URL syntax, such as file:///path/file.ext, when the file is located elsewhere.

In Chapter 5, "XSLT Processing with Java", we will show how to invoke Xalan and other XSLT processors from Java code, which is far more efficient because a separate Java Virtual Machine (JVM) does not have to be invoked for each transformation. Although it can take several seconds to start the JVM, the actual XSLT transformations will usually occur in milliseconds.

Another option is to find a web browser that supports XSLT, which allows you to edit your stylesheet and hit the "Reload" button to view the transformation.

Chapter 2. XSLT Part 1 -- The Basics