Transformations (Perl and XML)

home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam

Book Home

2.12. Transformations

The last topic we want to introduce is the concept of transformations. In XML, a transformation is a process of restructuring or converting a document into another form. The W3C recommends a language for transforming XML called XML Stylesheet Language for Transformations (XSLT). It's an incredibly useful and fun technology to work with.

Like XML Schema, an XSLT transformation script is an XML document. It's composed of template rules, each of which is an instruction for how to turn one element type into something else. The term template is often used to mean an example of how something should look, with blanks that you should fill in. That's exactly how template rules work: they are examples of how the final document should be, with the blanks filled in by the XSLT processor.

Example 2-5 is a rudimentary transformation that converts a simple DocBook XML document into an HTML page.

Example 2-5. An XSLT transformation document

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:output method="html"/>

<!-- RULE FOR BOOK ELEMENT -->
<xsl:template match="book">
  <html>
    <head>
      <title><xsl:value-of select="title"/></title>
    </head>
    <body>
      <h1><xsl:value-of select="title"/></h1>
      <h3>Table of Contents</h3>
      <xsl:call-template name="toc"/>
      <xsl:apply-templates select="chapter"/>
    </body>
  </html>
</xsl:template>

<!-- RULE FOR CHAPTER -->
<xsl:template match="chapter">
  <xsl:apply-templates/>
</xsl:template>

<!-- RULE FOR CHAPTER TITLE -->
<xsl:template match="chapter/title">
  <h2>
    <xsl:text>Chapter </xsl:text>
    <xsl:number count="chapter" level="any" format="1"/>
  </h2>
  <xsl:apply-templates/>
</xsl:template>
  
<!-- RULE FOR PARA -->
<xsl:template match="para">
  <p><xsl:apply-templates/></p>
</xsl:template>

<!-- NAMED RULE: TOC -->
<xsl:template name="toc">
  <xsl:if test="count(chapter)>0">
    <xsl:for-each select="chapter">
      <xsl:text>Chapter </xsl:text>
      <xsl:value-of select="position( )"/>
      <xsl:text>: </xsl:text>
      <i><xsl:value-of select="title"/></i>
      <br/>
    </xsl:for-each>
  </xsl:if>
</xsl:template>

</xsl:stylesheet>

First, the XSLT processor reads the stylesheet and creates a table of template rules. Next, it parses the source XML document (the one to be converted) and traverses it one node at a time. A node is an element, a piece of text, a processing instruction, an attribute, or a namespace declaration. For each node, the XSLT processor tries to find the best matching rule. It applies the rule, outputting everything the template says it should, jumping to other rules as necessary.

Example 2-6 is a sample document on which you can run the transformation.

Example 2-6. A document to transform

<book>
  <title>The Blathering Brains</title>
  <chapter>
    <title>At the Bazaar</title>
    <para>What a fantastic day it was. The crates were stacked
          high with imported goods: dates, bananas, dried meats,
          fine silks, and more things than I could imagine. As I
          walked around, savoring the fragrances of cinnamon and
          cardamom, I almost didn't notice a small booth with a
          little man selling brains.</para>
    <para>Brains! Yes, human brains, still quite moist and squishy,
          swimming in big glass jars full of some greenish
          fluid.</para>
    <para>"Would you like a brain, sir?" he asked. "Very reasonable
          prices. Here is Enrico Fermi's brain for only two
          dracmas. Or, perhaps, you would prefer Aristotle?  Or the
          great emperor Akhnaten?"</para>
    <para>I recoiled in horror...</para>
  </chapter>
</book>

Let's walk through the transformation.

The first element is <book>. The best matching rule is the first one, because it explicitly matches "book." The template says to output tags like <html>, <head>, and <title>. Note that these tags are treated as data markup because they don't have the xsl: namespace prefix.
When the processor gets to the XSLT instruction <xsl:value-of select="title"/>, it has to find a <title> element that is a child of the current element, <book>. Then it must obtain the value of that element, which is simply all the text contained within it. This text is output inside a <title> element as the template directs.
The processor continues in this way until it gets to the <xsl:call-template name="toc"/> instruction. If you look at the bottom of the stylesheet, you'll find a template rule that begins with <xsl:template name="toc">. This template rule is a named template and acts like a function call. It assembles a table of contents and returns the text to the calling rule for output.
Inside the named template is an element called <xsl:if test="count(chapter)>0">. This element is a conditional statement whose test is whether more than one <chapter> is inside the current element (still <book>). The test passes, and processing continues inside the element.
The <xsl:for-each select="chapter"> instruction causes the processor to visit each <chapter> child element and temporarily make it the current element while in the body of the <xsl:for-each> element. This step is analogous to a foreach( ) loop in Perl. The <xsl:value-of select="position( )"/> statement derives the numerical position of each <chapter> and outputs it so that the result document reads "Chapter 1," "Chapter 2," and so on.
The named template "toc" returns its text to the calling rule and execution continues. Next, the processor receives an <xsl:apply-templates select="chapter"/> directive. An output of <xsl:apply-templates> without any attributes means that the processor should then process each of the current element's children, making it the current element. However, since a select="chapter" attribute is present, only children who are of type <chapter> should be processed. After all descendants have been processed and this instruction returns its text, it will be output and the rest of the rule will be followed until the end.
Moving on to the first <chapter> element, the processor locates a suitable rule and sees only an <xsl:apply-tempaltes/> rule. The rest of the processing is pretty easy, as the rules for the remaining elements, <title> and <para>, are straightforward.

XSLT is a rich language for handling transformations, but often leaves something to be desired. It can be slow on large documents, since it has to build an internal representation of the whole document before it can do any processing. Its syntax, while a remarkable achievement for XML, is not as expressive and easy to use as Perl. We will explore numerous Perl solutions to some problems that XSL could also solve. You'll have to decide whether you prefer XSLT's simplicity or Perl's power.

That's our whirlwind tour of XML. Next, we'll jump into the fundamentals of XML processing with Perl using parsers and basic writers. At this point, you should have a good idea of what XML is used for and how it's used, and you should be able to recognize all the parts when you see them. If you still have any doubts, stop now and grab an XML tutorial.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.