2.12. Transformations
The last topic we want to introduce is the concept of transformations. In XML, a
transformation is a process of restructuring
or converting a document into another form. The W3C recommends a
language for transforming XML called
XML Stylesheet Language for
Transformations (XSLT). It's an incredibly useful
and fun technology to work with.
Like XML Schema, an XSLT transformation script is an XML document.
It's composed of template
rules, each of which is an instruction for
how to turn one element type into something else. The term
template is often used to mean an example of
how something should look, with blanks that you should fill in.
That's exactly how template rules work: they are
examples of how the final document should be, with the blanks filled
in by the XSLT processor.
Example 2-5 is a rudimentary transformation that
converts a simple
DocBook XML document into an
HTML
page.
Example 2-5. An XSLT transformation document
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="html"/>
<!-- RULE FOR BOOK ELEMENT -->
<xsl:template match="book">
<html>
<head>
<title><xsl:value-of select="title"/></title>
</head>
<body>
<h1><xsl:value-of select="title"/></h1>
<h3>Table of Contents</h3>
<xsl:call-template name="toc"/>
<xsl:apply-templates select="chapter"/>
</body>
</html>
</xsl:template>
<!-- RULE FOR CHAPTER -->
<xsl:template match="chapter">
<xsl:apply-templates/>
</xsl:template>
<!-- RULE FOR CHAPTER TITLE -->
<xsl:template match="chapter/title">
<h2>
<xsl:text>Chapter </xsl:text>
<xsl:number count="chapter" level="any" format="1"/>
</h2>
<xsl:apply-templates/>
</xsl:template>
<!-- RULE FOR PARA -->
<xsl:template match="para">
<p><xsl:apply-templates/></p>
</xsl:template>
<!-- NAMED RULE: TOC -->
<xsl:template name="toc">
<xsl:if test="count(chapter)>0">
<xsl:for-each select="chapter">
<xsl:text>Chapter </xsl:text>
<xsl:value-of select="position( )"/>
<xsl:text>: </xsl:text>
<i><xsl:value-of select="title"/></i>
<br/>
</xsl:for-each>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
First, the XSLT processor reads the stylesheet and creates a table of
template rules. Next, it parses the source XML document (the one to
be converted) and traverses it one node at a time. A
node is an element, a piece of text, a
processing instruction, an attribute, or a namespace declaration. For
each node, the XSLT processor tries to find the best
matching rule. It applies the rule, outputting
everything the template says it should, jumping to other rules as
necessary.
Example 2-6 is a sample document on which you can
run the transformation.
Example 2-6. A document to transform
<book>
<title>The Blathering Brains</title>
<chapter>
<title>At the Bazaar</title>
<para>What a fantastic day it was. The crates were stacked
high with imported goods: dates, bananas, dried meats,
fine silks, and more things than I could imagine. As I
walked around, savoring the fragrances of cinnamon and
cardamom, I almost didn't notice a small booth with a
little man selling brains.</para>
<para>Brains! Yes, human brains, still quite moist and squishy,
swimming in big glass jars full of some greenish
fluid.</para>
<para>"Would you like a brain, sir?" he asked. "Very reasonable
prices. Here is Enrico Fermi's brain for only two
dracmas. Or, perhaps, you would prefer Aristotle? Or the
great emperor Akhnaten?"</para>
<para>I recoiled in horror...</para>
</chapter>
</book>
Let's walk through the transformation.
-
The first element is <book>. The best
matching rule is the first one, because it explicitly matches
"book." The template says to output
tags like <html>,
<head>, and
<title>. Note that these tags are treated as
data markup because they don't have the
xsl: namespace prefix.
-
When the processor gets to the XSLT instruction
<xsl:value-of select="title"/>, it has to
find a <title> element that is a child of
the current element, <book>. Then it must
obtain the value of that element, which is
simply all the text contained within it. This text is output inside a
<title> element as the template directs.
-
The processor continues in this way until it gets to the
<xsl:call-template
name="toc"/> instruction. If you look at the
bottom of the stylesheet, you'll find a template
rule that begins with <xsl:template
name="toc">. This template rule is a named
template and acts like a function call. It assembles a
table of contents and returns the text to the calling rule for
output.
-
Inside the named template is an element called <xsl:if
test="count(chapter)>0">. This element is a
conditional statement whose test is whether more than one
<chapter> is inside the current element
(still <book>). The test passes, and
processing continues inside the element.
-
The <xsl:for-each select="chapter">
instruction causes the processor to visit each
<chapter> child element and temporarily make
it the current element while in the body of the
<xsl:for-each> element. This step is
analogous to a foreach( ) loop in Perl. The
<xsl:value-of select="position( )"/>
statement derives the numerical position of each
<chapter> and outputs it so that the result
document reads "Chapter 1,"
"Chapter 2," and so on.
-
The named template "toc" returns
its text to the calling rule and execution continues. Next, the
processor receives an <xsl:apply-templates
select="chapter"/> directive. An output of
<xsl:apply-templates> without any attributes
means that the processor should then process each of the current
element's children, making it the current element.
However, since a select="chapter" attribute is
present, only children who are of type
<chapter> should be processed. After all
descendants have been processed and this instruction returns its
text, it will be output and the rest of the rule will be followed
until the end.
-
Moving on to the first <chapter> element,
the processor locates a suitable rule and sees only an
<xsl:apply-tempaltes/> rule. The rest of the
processing is pretty easy, as the rules for the remaining elements,
<title> and <para>,
are straightforward.
XSLT is a rich language for handling transformations, but often
leaves something to be desired. It can be slow on large documents,
since it has to build an internal representation of the whole
document before it can do any processing. Its syntax, while a
remarkable achievement for XML, is not as expressive and easy to use
as Perl. We will explore numerous Perl solutions to some problems
that XSL could also solve. You'll have to decide
whether you prefer XSLT's simplicity or
Perl's power.
That's our whirlwind tour of XML. Next,
we'll jump into the fundamentals of XML processing
with Perl using parsers and basic writers. At this point, you should
have a good idea of what XML is used for and how
it's used, and you should be able to recognize all
the parts when you see them. If you still have any doubts, stop now
and grab an XML tutorial.
 |  |  | | 2.11. Schemas |  | 3. XML Basics: Reading and Writing |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|