Transformations (Java & XML, 2nd Edition)

2.3.3. XPath

The final piece of the XML transformations puzzle, XPath provides a mechanism for referring to the wide variety of element and attribute names and values in an XML document. As I mentioned earlier, many XML specifications are now using XPath, but this discussion is concerned only with its use in XSLT. With the complex structure that an XML document can have, locating one specific element or set of elements can be difficult. It is made more difficult because access to a DTD or other set of constraints that outlines the document's structure cannot be assumed; documents that are not validated must be able to be transformed just as valid documents can. To accomplish this addressing of elements, XPath defines syntax in line with the tree structure of XML, and the XSLT processes and constructs that use it.

Referencing any element or attribute within an XML document is most easily accomplished by specifying the path to the element relative to the current element being processed. In other words, if element B is the current element and element C and element D are nested within it, a relative path most easily locates them. This is similar to the relative paths used in operating system directory structures. At the same time, XPath also defines addressing for elements relative to the root of a document. This covers the common case of needing to reference an element not within the current element's scope; in other words, an element that is not nested within the element being processed. Finally, XPath defines syntax for actual pattern matching: find an element whose parent is element E and which has a sibling element F. This fills in the gaps left between the absolute and relative paths. In all these expressions, attributes can be used as well, with similar matching abilities. Several examples are shown in Example 2-6.

Example 2-6. XPath expressions

<!-- Match the element named Book relative to the current element -->
<xsl:value-of select="Book" />

<!-- Match the element named Contents nested within the Book element -->
<xsl:value-of select="Book/Contents" />

<!-- Match the Contents element using an absolute path -->
<xsl:value-of select="/Book/Contents" />

<!-- Match the name attribute of the current element -->
<xsl:value-of select="@name" />

<!-- Match the title attribute of the Chapter element -->
<xsl:value-of select="Chapter/@title" />

Because the input document is often not fixed, an XPath expression can result in the evaluation of no input data, one input element or attribute, or multiple input elements and attributes. This ability makes XPath very useful and handy; it also causes the introduction of some additional terms. The result of evaluating an XPath expression is generally referred to as a node set. This name shouldn't be surprising, as it is in line with the idea of a hierarchical or tree structure, often dealt with in terms of its leaves or nodes. The resultant node set can then be transformed, copied, or ignored, or have any other legal operation performed on it. In addition to expressions to select node sets, XPath also defines several node set functions, such as not( ) and count( ). These functions take in a node set as input (typically in the form of an XPath expression) and then further pare the results. All of these expressions and functions are collectively part of the XPath specification and XPath implementations; however, XPath is also often used to signify any expression that conforms to the specification itself. As with XSL and XSLT, this makes it easier to talk about XSL and XPath, though it is not always technically correct.

With all that in mind, you're at least somewhat prepared to take a look at a simple XSL stylesheet, shown in Example 2-7. Although you may not understand all of this now, let's briefly look at some key aspects of the stylesheet.

Example 2-7. XSL stylesheet for Example 2-1

<?xml version="1.0"?>

<xsl:stylesheet xmlns:javaxml2="http://www.oreilly.com/javaxml2"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:ora="http://www.oreilly.com"
                version="1.0"
>

  <xsl:template match="javaxml2:book">
    <html>
      <head>
        <title><xsl:value-of select="javaxml2:title" /></title>
      </head>
      <body>
        <xsl:apply-templates select="*[not(self::javaxml2:title)]" />
      </body>
    </html>
  </xsl:template>

  <xsl:template match="javaxml2:contents">
    <center>
     <h2>Table of Contents</h2>
    </center>
    <hr />
    <ul>
     <xsl:for-each select="javaxml2:chapter">
      <b>
       Chapter <xsl:value-of select="@number" />.
       <xsl:text> </xsl:text>
       <xsl:value-of select="@title" />
      </b>
      <xsl:for-each select="javaxml2:topic">      
       <ul>
        <li><xsl:value-of select="@name" /></li>
       </ul>
      </xsl:for-each>
     </xsl:for-each>
    </ul>
  </xsl:template>

  <xsl:template match="ora:copyright">
    <p align="center"><font size="-1">
     <xsl:copy-of select="*" />
    </font></p>
  </xsl:template>

</xsl:stylesheet>

2.3.3.1. Template matching

The basis of all XSL work is template matching. For any element you want some sort of output to occur on, you generally provide a template that matches the element. You signify a template with the template keyword, and provide the name of the element to match in its match attribute:

<xsl:template match="javaxml2:book">
  <html>
    <head>
      <title><xsl:value-of select="javaxml2:title" /></title>
    </head>
    <body>
      <xsl:apply-templates select="*[not(self::javaxml2:title)]" />
    </body>
  </html>
</xsl:template>

Here, the book element (in the javaxml2-associated namespace) is being matched. When an XSL processor encounters the book element, the instructions within this template are carried out. In the example, several HTML formatting tags are output (the html, head, title, and body tags). Be sure to distinguish your XSL elements from other elements (such as HTML elements) with proper use of namespaces.

Instead of applying a template, you can use the value-of construct to obtain the value of an element, and provide the element name to match through the select attribute. In the example, the character data within the title element is extracted and used as the title of the HTML form to output.

On the other hand, when you want to cause the templates associated with an element's children to be applied, use apply-templates. Be sure to do this, or nested elements can be ignored! You can specify the elements to apply templates to using the select attribute; by specifying a value of "*" to that attribute, all templates left will be applied to all nested elements. In the example, though, I want to exclude the title element (since I already used it in the document heading). To accomplish this, I've used the not keyword, and specified the title element on the self axis, which basically means "everything (*), except (not) the title element in this document (self::javaxml2:title). That's a quick overview, but I'm just trying to give you enough information to move on to the Java code.

2.3.3.2. Looping

You'll also often find a need for looping in XSL. Look at this fragment from Example 2-7:

<xsl:template match="javaxml2:contents">
  <center>
   <h2>Table of Contents</h2>
  </center>
  <hr />
  <ul>
   <xsl:for-each select="javaxml2:chapter">
    <b>
     Chapter <xsl:value-of select="@number" />.
     <xsl:text> </xsl:text>
     <xsl:value-of select="@title" />
    </b>
    <xsl:for-each select="javaxml2:topic">      
     <ul>
      <li><xsl:value-of select="@name" /></li>
     </ul>
    </xsl:for-each>
   </xsl:for-each>
  </ul>
</xsl:template>

Here, I'm looping through each element named chapter using the for-each construct. In Java, this would be:

for (Iterator i = chapters.iterator(); i.hasNext( ); ) {
    // take action on each chapter
}

Within the loop, the "current" element becomes the next chapter element encountered. For each, I output the chapter number; this is accomplished by getting the value (through value-of) of the number attribute. To indicate that I want an attribute (not the default, an element), I prefix the attribute name with the "@" sign. I do the same thing to get the title attribute's value, and then in a subloop I move through the topics for each chapter.

Notice the rather odd code fragment <xsl:text> <xsl:text>. The text construct provides a way to directly output characters to the result tree. This construct generates a space between the word "Chapter" and the chapter number (there is a single space between the opening and closing text tags).

2.3.3.3. Copying

You will also find times when all the template matching in the world isn't as useful as simply passing on the content, unchanged, to the output tree. This is the case with the copyright element:

<xsl:template match="ora:copyright">
  <p align="center"><font size="-1">
   <xsl:copy-of select="*" />
  </font></p>
</xsl:template>

In addition to a little bit of HTML formatting, this template instructs all the content of the copyright element to be copied to the output tree, using the copy-of construct. Simple enough.

You'll learn how to use a publishing framework like Cocoon to render the result of this transformation to HTML, a PDF, or more in Chapter 10, "Web Publishing Frameworks". Rather than keeping you waiting, though, Figure 2-2 shows the transformed output from Example 2-1 and the stylesheet in Example 2-6.

Figure 2-2. Result of XSL transformation

I realize that I've virtually flown through this material, but again, I'm just trying to get you past the basics and to the good stuff, the Java and XML. Have a reference handy, and don't sweat it too much.

2.3. Transformations

2.3.1. XSL

2.3.1.1. XSL and trees

Figure 2-1. Tree operations within XSL

2.3.1.2. Formatting objects