3.2. Location Paths
One of the most common uses of XPath is to create location paths. A location path describes the location of something in an XML document. In our examples in the previous chapter, we used location paths on the match and select attributes of various XSLT elements. Those location paths described the parts of the XML document we wanted to work with. Most of the XPath expressions you'll use are location paths, and most of them are pretty simple. Before we dive in to the wonders of XPath, we need to discuss the context.
3.2.1. The Context
One of the most important concepts in XPath is the context. Everything we do in XPath is interpreted with respect to the context. You can think of an XML document as a hierarchy of directories in a filesystem. In our sonnet example, we could imagine that sonnet is a directory at the root level of the filesystem. The sonnet directory would, in turn, contain directories named auth:author, title, and lines. In this example, the context would be the current directory. If I go to a command line and execute a particular command (such as dir *.js), the results I get vary depending on the current directory. Similarly, the results of evaluating an XPath expression will probably vary based on the context.
Most of the time, we can think of the context as the node in the tree from which any expression is evaluated. To be completely accurate, the context consists of five things:
Having said all that, most of the time you can ignore everything but the context node. To use our command line analogy one more time, if you're at a command line, you have a current directory; you also have (depending on your operating system) a number of environment variables defined. For most commands, you can focus on the current directory and ignore the environment variables.
3.2.2. Simple Location Paths
Now that we've talked about what a context is and why it matters, we'll look at some location paths. We'll start with a variety of simple location paths; as we go along, we'll look at more complex location paths that use all the various features of XPath. We already looked at one of the simplest XPath expressions:
This template selects the root node of the document. We saw another simple XPath expression in the <xsl:value-of> element:
This template selects the context node, represented by a period. To complete our tour of very simple location paths, we can use the double period (..) to select the parent of the context node:
All these XPath expressions have one thing in common: they don't use element names. As you might have noticed in our Hello World example, you can use element names to select elements that have a particular name:
In this example, we select all of the <greeting> elements in the current context and apply the appropriate template to each of them. Turning to our XML sonnet, we can create location paths that specify more than one level in the document hierarchy:
This example selects all <line> elements that are contained in any <lines> elements in the current context. If the current context doesn't have any <lines> elements, then this expression returns an empty node-set. If the current context has plenty of <lines> elements, but none of them contain any <line> elements, this expression also returns an empty node-set.
3.2.3. Relative and Absolute Expressions
The XPath specification talks about two kinds of XPath expressions, relative and absolute. Our previous example is a relative XPath expression because the nodes it specifies depend on the current context. An absolute XPath expression begins with a slash (/), which tells the XSLT processor to start at the root of the document, regardless of the current context. In other words, you can evaluate an absolute XPath expression from any context node you want, and the results will be the same. Here's an absolute XPath expression:
The good thing about an absolute expression is that you don't have to worry about the context node. Another benefit is that it makes it easy for the XSLT processor to find all nodes that match this expression: what we've said in this expression is that there must be a <sonnet> element at the root of the document, that element must contain at least one <lines> element, and that at least one of those <lines> elements must contain at least one <line> element. If any of those conditions fail, the XSLT processor can stop looking through the tree and return an empty node-set.
A possible disadvantage of using absolute XPath expressions is that it could make your templates more difficult to reuse. Both of these templates process <line> elements, but the second one is more difficult to reuse:
<xsl:template match="line"> ... </xsl:template> <xsl:template match="/sonnet/lines/line"> ... </xsl:template>
If the second template has wonderful code for processing <line> elements, but your document contains <line> elements that don't match the absolute XPath expression, you can't reuse that template. You should keep that in mind as you design your templates.
3.2.4. Selecting Things Besides Elements with Location Paths
Up until now, we've discussed XPath expressions that used either element names (/sonnet/lines/line) or special characters (/ or ..) to select elements from an XML document. Obviously, XML documents contain things other than elements; we'll talk about how to select those other things here.
220.127.116.11. Selecting attributes
To select an attribute, use the at-sign (@) along with the attribute name. In our sample sonnet, you can select the type attribute of the <sonnet> element with the XPath expression /sonnet/@type. If the context node is the <sonnet> element itself, then the relative XPath expression @type does the same thing.
18.104.22.168. Selecting the text of an element
To select the text of an element, use the XPath node test text(). The XPath expression /sonnet/auth:author/last-name/text() selects the text of the last-name element in our example document. Be aware that the text of an element is the concatenation of all of its text nodes. Thus, the XPath expression /sonnet/auth:author/text() returns the following text:
That's probably not the output you want; if you want to provide spacing, line breaks, or other formatting, you need to use the text() node test against all the child nodes individually.
22.214.171.124. Selecting comments, processing instructions, and namespace nodes
By this point, we've covered most of the things you're ever likely to do with an XPath expression. You can use a couple of other XPath node tests to describe parts of an XML document. The comment() and processing-instruction() node tests allow you to select comments and processing instructions from the XML document. Going back to our sample sonnet, the XPath expression /processing-instruction() returns the two processing instructions (named xml-stylesheet and cocoon-process). The expression /sonnet/comment() returns the comment node that begins, "Is there an official title for this sonnet?"
Processing comment nodes in this way can actually be useful. If you've entered comments into an XML document, you can use the comment() node test to display your comments only when you want. Here's an XSLT template you could use:
<xsl:template match="comment()"> <span class="comment"> <p><xsl:value-of select="."/></p> </span> </xsl:template>
Elsewhere in your stylesheet, you could define CSS attributes to print comments in a large, bold, purple font. To remove all comments from your output document, simply go to your stylesheet and comment out any <xsl:apply-templates select="comment()"/> statements.
XPath has one other kind of node, the rarely used namespace node. To retrieve namespace nodes, you have to use something called the namespace axis; we'll discuss axes soon. One note about namespace nodes, if you ever have to use them: When matching namespace nodes, the namespace prefix isn't important. As an example, our sample sonnet used the auth namespace prefix, which maps to the value http://www.authors.com/. If a stylesheet uses the namespace prefix writers to refer to the same URL, then the XPath expression /sonnet/writers::* would return the <auth:author> element. Even though the namespace prefixes are different, the URLs they refer to are the same.
Having said all that, the chances that you'll ever need to use namespace nodes are pretty slim.
3.2.5. Using Wildcards
In addition to these wildcards, XPath includes the double slash (//), which indicates that zero or more elements may occur between the slashes. For example, the XPath expression //line selects all <line> elements, regardless of where they appear in the document. This is an absolute XPath expression because it begins with a slash. You can also use the double slash at any point in an XPath expression; the expression /sonnet//line selects all <line> elements that are descendants of the <sonnet> element at the root of the XML document. The expressions /sonnet//line and /sonnet/descendant-or-self::line are equivalent.
WARNING: The double slash (//) is a very powerful operator, but be aware that it can make your stylesheets incredibly inefficient. If we use the XPath expression //line, the XSLT processor has to check every node in the document to see if there are any <line> elements. The more specific you can be in your XPath expressions, the less work the XSLT processor has to do, and the faster your stylesheets will execute. Thinking back to our filesystem metaphor, if I go to a Windows command prompt and type dir/s c:\*.xml, the operating system has to look in every subdirectory for any *.xml files that might be there. However, if I type dir /s c:\doug\projects\xml-docs\*.xml, the operating system has far fewer places to look, and the command will execute much faster.
To this point, we've been able to select child elements, attributes, text, comments, and processing instructions with some fairly simple XPath expressions. Obviously, we might want to select many other things, such as:
To select these things, XPath provides a number of axes that let you specify various collections of nodes. There are thirteen axes in all; we'll discuss all of them here, even though most of them won't be particularly useful to you. To use an axis in an XPath expression, type the name of the axis, a double colon (::), and the name of the element you want to select, if any.
Before we define all of the axes, though, we need to talk about XPath's unabbreviated syntax.
126.96.36.199. Unabbreviated syntax
To this point, all the XPath expressions we've looked at used the XPath abbreviated syntax. Most of the time, that's what you'll use; however, most of the lesser-used axes can only be specified with the unabbreviated syntax. For example, when we wrote an XPath expression to select all of the <line> elements in the current context, we used the abbreviated syntax:
If you really enjoy typing, you can use the unabbreviated syntax to specify that you want all of the <line> children of the current context:
We'll go through all of the axes now, pointing out which ones have an abbreviated syntax.
188.8.131.52. Axis roll call
The following list contains all of the axes defined by the XPath standard, with a brief description of each one.
There's one more aspect of XPath expressions that we haven't discussed: predicates. Predicates are filters that restrict the nodes selected by an XPath expression. Each predicate is evaluated and converted to a Boolean value (either true or false). If the predicate is true for a given node, that node will be selected; otherwise, the node is not selected. Predicates always appear inside square brackets (). Here's an example:
This expression selects the third <line> element in the current context. If there are two or fewer <line> elements in the current context, this XPath expression returns an empty node-set. Several things can be part of a predicate; we'll go through them here.
184.108.40.206. Numbers in predicates
A number inside square brackets selects nodes that have a particular position. For example, the XPath expression line selects the seventh <line> element in the context node. XPath also provides the boolean and and or operators as well as the union operator (|) to combine predicates. The expression line[position()=3 and @style] matches all <line> elements that occur third and have a style attribute, while line[position()=3 or @style] matches all <line> elements that either occur third or have a style attribute. With the union operator, the expression line[3|7] matches all third and seventh <line> elements in the current context, as does the more verbose line | line.
220.127.116.11. Functions in predicates
Copyright © 2002 O'Reilly & Associates. All rights reserved.