home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  

Book HomePerl & XMLSearch this book

8.3. XSLT

If you think of XPath as a regular expression syntax, then XSLT is its pattern substitution mechanism. XSLT is an XML-based programming language for describing how to transform one document type into another. You can do some amazing things with XSLT, such as describe how to turn any XML document into HTML or tabulate the sum of figures in an XML-formatted table. In fact, you might not need to write a line of code in Perl or any language. All you really need is an XSLT script and one of the dozens of transformation engines available for processing XSLT.

The Origin of XSLT

XSLT stands for XML Style Language: Transformations. The name means that it's a component of the XML Style Language (XSL), assigned to handle the task of converting input XML into a special format called XSL-FO (the FO stands for "Formatting Objects"). XSL-FO contains both content and instructions for how to make it pretty when displayed.

Although it's stuck with the XSL name, XSLT is more than just a step in formatting; it's an important XML processing tool that makes it easy to convert from one kind of XML to another, or from XML to text. For this reason, the W3C (yup, they created XSLT too) released the recommendation for it years before the rest of XSL was ready.

To read the specification and find links to XSLT tutorials, look at its home page at http://www.w3.org/TR/xslt.

An XSLT transformation script is itself an XML document. It consists mostly of rules called templates, each of which tells how to treat a specific type of node. A template usually does two things: it describes what to output and defines how processing should continue.

Consider the script in Example 8-9.

Example 8-9. An XSLT stylesheet


  <xsl:template match="html">
    <xsl:text>Title: </xsl:text>
    <xsl:value-of select="head/title"/>
    <xsl:apply-templates select="body"/>

  <xsl:template match="body">

  <xsl:template match="h1 | h2 | h3 | h4">
    <xsl:text>Head: </xsl:text>
    <xsl:value-of select="."/>

  <xsl:template match="p | blockquote | li">
    <xsl:text>Content: </xsl:text>
    <xsl:value-of select="."/>

This transformation script converts an HTML document into ASCII with some extra text labels. Each <xsl:template> element is a rule that matches a part of an XML document. Its content consists of instructions to the XSLT processor describing what to output. Directives like <xsl:apply-templates> direct processing to other elements (usually descendants). We won't go into detail about XSLT syntax, as whole books on the subject are available. Our intent here is to show how you can combine XSLT with Perl to do powerful XML munching.

You might wonder, "Why do I need to use another language to transform XML when I can do that with the Perl I already know?" True, XSLT doesn't do anything you couldn't do in Perlish coding. Its value comes in the ease of learning the language. You can learn XSLT in few hours, but to do the same things in Perl would take much longer. In our experience writing software for XML, we found it convenient to use XSLT as a configuration file that nonprogrammers could maintain themselves. Thus, instead of viewing XSLT as competition for Perl, think of it more as a complementary technology that you can access through Perl when you need to.

How do Perl hackers employ the power of XSLT in their programs? Example 8-10 shows how to perform an XSLT transformation on a document using XML::LibXSLT, Matt Sergeant's interface to the super-fast GNOME library called LibXSLT, one of several XSLT solutions available from your CPAN toolbox.[29]

[29]Others that are currently available include the pure-Perl XML::XSLT module, and XML::Sablotron, based on the Expat and Sablotron C libraries (the latter of which is an XSLT library by the Ginger Alliance: http://www.gingerall.com).

Example 8-10. A program to run an XSLT transformation

use XML::LibXSLT;
use XML::LibXML;

# the arguments for this command are stylesheet and source files
my( $style_file, @source_files ) = @ARGV;

# initialize the parser and XSLT processor
my $parser = XML::LibXML->new( );
my $xslt = XML::LibXSLT->new( );
my $stylesheet = $xslt->parse_stylesheet_file( $style_file );

# for each source file: parse, transform, print out result
foreach my $file ( @source_files ) {
  my $source_doc = $parser->parse_file( $source_file );
  my $result = $stylesheet->transform( $source_doc );
  print $stylesheet->output_string( $result );

The nice thing about this program is that it parses the stylesheet only once, keeping it in memory for reuse with other source documents. Afterwards, you have the document tree to do further work, if necessary:

  • Postprocess or preprocess the text of the document with search-replace routines.

  • Pluck a piece of the document out to transform just that bit.

  • Run an iterator over the tree to handle some nodes that would be too difficult to process in XSLT.

The possibilities are endless and, as always in Perl, whatever you want to do, there's more than one way to do it.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.