2.4. XPath Basics
XPath is another recommendation from the W3C and is designed for use by XSLT and another technology called XPointer. The primary goal of XPath is to define a mechanism for addressing portions of an XML document, which means it is used for locating element nodes, attribute nodes, text nodes, and anything else that can occur in an XML document. XPath treats these nodes as part of a tree structure rather than dealing with XML as a text string. XSLT also relies on the tree structure that XPath defines. In addition to addressing, XPath contains a set of functions to format text, convert to and from numbers, and deal with booleans.
Unlike XSLT, XPath itself is not expressed using XML syntax. A simplified syntax makes sense when you consider that XPath is most commonly used inside of attribute values within other XML documents. XPath includes both a verbose syntax and a set of abbreviations, which end up looking a lot like path names on a file system or web site.
2.4.1. How XSLT Uses XPath
Whenever XSLT uses XPath, something in the XML data is considered to be the current context node. XPath defines seven different types of nodes, each representing a different part of the XML data. These are the document root, elements, text, attributes, processing instructions, comments, and nodes representing namespaces. An axis represents a relationship to the current context node, which may be any one of the preceding seven items.
A few examples should clear things up. One axis is child, representing all immediate children of the context node. From our earlier schedule.xml example, the child axis of <name> includes the <first> and <last> elements. Another axis is parent, which represents the immediate parent of the context node. In many cases the axis is empty. For example, the document root node has no parent axis. Figure 2-4 illustrates some of the other axes.
Figure 2-4. XPath axes
As you can see, the second <department> element is the context node. The diagram illustrates how some of the more common axes relate to this node. Although the names are singular, in most cases the axes represent node sets rather than individual nodes. The code:
selects all <team> children, not just the first one. Table 2-1 lists the available axes in alphabetical order, along with a brief description of each.
Table 2-1. Axes summary
2.4.3. Location Steps
As you may have guessed, an axis alone is only a piece of the puzzle. A location step is a more complex construct used by XPath and XSLT to select a node set from the XML data. Location steps have the following syntax:
The axis and node-test are separated by double colons and are followed by zero or more predicates. As mentioned, the job of the axis is to specify the relationship between the context node and the node-test. The node-test allows you to specify the type of node that will be selected, and the predicates filter the resulting node set.
Once again, discussion of XSLT and XPath tends to sound overly technical until you see a few basic examples. Let's start with a basic fragment of XML:
<message> <header> <!-- the context node --> <subject>Hello, World</subject> <date mm="03" dd="01" yy="2002"/> <sender>firstname.lastname@example.org</sender> <recipient>email@example.com</recipient> <recipient>firstname.lastname@example.org</recipient> <recipient>email@example.com</recipient> </header> <body> ... </body> </message>
If the <header> is the context node, then child::subject will select the <subject> node, child::recipient will select the set of all <recipient> nodes, and child::* will select all children of <header>. The asterisk (*) character is a wildcard that represents all nodes of the principal node type. Each axis has a principal node type, which is always element unless the axis is attribute or namespace. If <date> is the context node, then attribute::yy will select the yy attribute, and attribute::* will select all attributes of the <date> element.
Without any predicates, a location step can result in zero or more nodes. Adding a predicate simply filters the resulting node set, generally reducing the size of the resulting node set. Adding additional predicates applies additional filters. For example, child::recipient[position( )=1] will initially select all <recipient> elements from the previous example then filter (reduce) the list down to the first one: firstname.lastname@example.org. Positions start at 1, rather than 0. As Example 2-8 will show, predicates can contain any XPath expression and can become quite sophisticated.
2.4.4. Location Paths
Location paths consist of one or more location steps, separated by slash (/) characters. An absolute location path begins with the slash (/) character and is relative to the document root. All other types of location paths are relative to the context node. Paths are evaluated from left to right, just like a path in a file system or a web site. The XML shown in Example 2-7 is a portion of a larger file containing basic information about U.S. presidents. This is used to demonstrate a few more XSLT and XPath examples.
Example 2-7. presidents.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="xpathExamples.xslt"?> <presidents> <president> <term from="1789" to="1797"/> <name> <first>George</first> <last>Washington</last> </name> <party>Federalist</party> <vicePresident> <name> <first>John</first> <last>Adams</last> </name> </vicePresident> </president> <president> <term from="1797" to="1801"/> <name> <first>John</first> <last>Adams</last> </name> <party>Federalist</party> <vicePresident> <name> <first>Thomas</first> <last>Jefferson</last> </name> </vicePresident> </president> /** * remaining presidents omitted */
The complete file is too long to list here but is included with the downloadable files for this book. The <vicePresident> element can occur many times or not at all because some presidents did not have vice presidents. Names can also contain optional <middle> elements. Using this XML data, the XSLT stylesheet in Example 2-8 shows several location paths.
Example 2-8. Location paths
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" /> <xsl:template match="/"> <html> <body> <h1>XPath Examples</h1> The third president was: <ul> <xsl:apply-templates select="presidents/president[position( ) = 3]/name"/> </ul> Presidents without vice presidents were: <ul> <xsl:apply-templates select="presidents/president[count(vicePresident) = 0]/name"/> </ul> Presidents elected before 1800 were: <ul> <xsl:apply-templates select="presidents/president[term/@from < 1800]/name"/> </ul> Presidents with more than one vice president were: <ul> <xsl:apply-templates select="descendant::president[count(vicePresident) > 1]/name"/> </ul> Presidents named John were: <ul> <xsl:apply-templates select="presidents/president/name[child::first='John']"/> </ul> Presidents elected between 1800 and 1850 were: <ul> <xsl:apply-templates select="presidents/president[(term/@from > 1800) and (term/@from < 1850)]/name"/> </ul> </body> </html> </xsl:template> <xsl:template match="name"> <li> <xsl:value-of select="first"/> <xsl:text> </xsl:text> <xsl:value-of select="middle"/> <xsl:text> </xsl:text> <xsl:value-of select="last"/> </li> </xsl:template> </xsl:stylesheet>
In the first <xsl:apply-templates> element, the location path is as follows:
presidents/president[position( ) = 3]/name
This path consists of three location steps separated by slash (/) characters, but the final step is what we want to select. This path is read from left to right, so it first selects the <presidents> children of the current context. The next step is relative to the <presidents> context and selects all <president> children. It then filters the list according to the predicate. The third <president> element is now the context, and its <name> children are selected. Since each president has only one <name>, the template that matches "name" is instantiated only once.
This location path shows how to perform basic numeric comparisons:
presidents/president[term/@from < 1800]/name
Since the less-than (<) character cannot appear in an XML attribute value, the < entity must be substituted. In this particular example, we use the @ abbreviated syntax to represent the attribute axis.
2.4.5. Abbreviated Syntax
Using descendant::, child::, parent::, and other axes is very verbose, requiring a lot of typing. Fortunately, XPath supports an abbreviated syntax for many of these axes that requires a lot less effort. The abbreviated syntax has the added advantage in that it looks like you are navigating the file system, so it tends to be somewhat more intuitive. Table 2-2 compares the abbreviated syntax to the verbose syntax. The abbreviated syntax is almost always used and will be used throughout the remainder of this book.
Table 2-2. Abbreviated syntax
In the last row, the abbreviation for the child axis is blank, indicating that child:: is an implicit part of a location step. This means that vicePresident/name is equivalent to child::vicePresident/child::name. Additional explanations follow:
Copyright © 2002 O'Reilly & Associates. All rights reserved.