10.9. XPath
XPath is a recommendation of the World Wide Web Consortium (W3C) for
locating nodes in an XML document tree. XPath is not designed to be
used alone but in conjunction with other tools, such as XSLT or
XPointer. These tools use XPath intensively and extend it for their
own needs through new functions and new basic types.
XPath provides a syntax for locating a node in an XML document. It
takes its inspiration from the syntax used to denote paths in
filesystems such as Unix. This node, often called the
context node, depends on the context of the
XPath expression. For example, the context of an XSLT expression
found in an <xsl:template match="para">
template will be the selected <para> element
(recall that XSLT templates use XPath expressions). This node can be
compared to a Unix shell's current directory.
Given our earlier XML examples, it is possible to write the following
expressions:
- chapter
-
Selects the <chapter> element descendants of
the context node
- chapter/para
-
Selects the <para> element descendants of
the <chapter> element children of the
context node
- ../chapter
-
Selects the <chapter> element descendants of
the parent of the context node
- ./chapter
-
Selects the <chapter> element descendants of
the context node
- *
-
Selects all element children of the context node
- */para
-
Selects the <para> grandchildren of the
context node
- .//para
-
Selects the <para> element descendants
(children, children of children, etc.) of the context node
- /para
-
Selects the <para> element children of the
document root element
In addition, XPath recognizes the at symbol (@) for selecting an
attribute instead of an element. Thus the following expressions can
be used to select an attribute:
- para/@id
-
Selects the id attribute of the
<para> element descendants of the context
node
- @*
-
Selects all the attributes in the context node
Paths can be combined using the | operator. For
example, intro | chapter selects the
<intro> and
<chapter> elements of the children of the
context node.
Certain functions can also be included in the path. The functions
must return a node or set of nodes. The functions available are:
Function
|
Selection
|
node( )
|
Any node (of any type)
|
text( )
|
Text node
|
comment( )
|
Comment node
|
processing-instruction( )
|
Processing-instruction node
|
id(id)
|
Node whose unique identifier is id
|
The id( ) function is especially helpful for
locating a node by its unique identifier (recall that identifiers are
attributes defined by the DTD). For example, we can write the
expression id("xml-ref")/title to select the
<title> element whose parent has the
xml-ref identifier.
The preceding examples show that the analogy with file paths is
rather limited. However, this syntax for writing an XPath expression
is a simplification of the more complete XPath syntax where an axis
precedes each step in the path.
10.9.1. Axes
Axes indicate the direction taken by the path. In the previous
examples, the syntactic qualifiers such as / for
root, .. for parent, and // for
descendant, are abbreviations that indicate the axis of the node
search. These are some of the simple axes on which to search for a
node.
XPath defines other search axes that are indicated by a prefix
separated from the rest of the XPath expression (called
location-steps) by a double colon. For
example, to indicate that we require a para node
to be the parent of the context node in the document, we could write
the expression preceding::para. XPath defines 13
axes:
Axis
|
Selection
|
self
|
The context node itself (abbreviated as .)
|
child
|
The children of the context node (by default)
|
descendant
|
The descendants of the context node; a descendant is a child, or a
child of a child, and so on
|
descendant-or-self
|
Same as the descendant, but also contains the context node
(abbreviated as //)
|
parent
|
The parent of the context node (abbreviated as ..)
|
ancestor
|
The ancestors of the context node
|
ancestor-or-self
|
The same nodes as the ancestor, plus the context node
|
following-sibling
|
Siblings (having the same parent as the context node) in the same
document that are after the context node
|
preceding-sibling
|
Siblings in the same document that are before the context node
|
following
|
All nodes in the same document that are after the context node
|
preceding
|
All nodes in the same document that are before the context node
|
attribute
|
The attributes of the context node (abbreviated as
@)
|
namespace
|
The namespace nodes of the context node
|
It is possible to write the following expressions:
- ancestor::chapter
-
Selects the <chapter> elements that are
ancestors of the context node
- following-sibling::para/@title
-
Selects the title attributes of
<para> elements in siblings of the context
node that follow it in document order
- id(xpath)/following::chapter/node( )
-
Selects all the nodes in the <chapter>
element following the element with the xpath
identifier in document order
The result of an XPath expression is a node-set. It may be helpful to
filter a node-set with predicates.
10.9.2. Predicates
A predicate is an expression in square brackets that filters a
node-set. For example, we could write the following expressions:
- //chapter[1]
-
Selects the first <chapter> element in the
document
- //chapter[@title=XPath]
-
Selects the <chapter> element in the
document where the value of the title attribute is
the string XPath
- //chapter[section]
-
Selects the <chapter> elements in the
document with a <section> child
- <para[last( )]>
-
Selects the last <para> element child of the
context node
Note that a path in a predicate does not change the path preceding
the predicate, but only filters it. Thus, the following expression:
/book/chapter[conclusion]
selects a <chapter> element that is a child
of the <book> element at the root of the
document with a descendant of type conclusion, but
not a <conclusion> element itself.
There may be more than one predicate in an expression. The following
expression:
/book/chapter[1]/section[2]
selects the second section of the first chapter. In addition, the
order of the predicates matters. Thus, the following expressions are
not the same:
- chapter[example][2]
-
Selects the second <chapter> that includes
<example> elements
- chapter[2][example]
-
Selects the second <chapter> element if it
includes at least one <example> element
An expression can include logical or comparison operators. The
following operators are available:
Operator
|
Meaning
|
or
|
Logical or
|
and
|
Logical and
|
not( )
|
Negation
|
= !=
|
Equal to and different from
|
< <=
|
Less than and less than or equal to
|
> >=
|
More than and more than or equal to
|
The character < must be entered as
< in expressions. Parentheses may be used
for grouping. For example:
- chapter[@title = XPath]
-
Selects <chapter> elements where the
title attribute has the value
XPath
- chapter[position( ) < 3]
-
Selects the first two <chapter> elements
- chapter[position( ) != last( )]
-
Selects <chapter> elements that are not in
the last position
- chapter[section/@title=examples or subsection/@title= examples]
-
Selects <chapter> elements that include
<section> or
<subsection> elements with the
title attribute set to examples
XPath also defines operators that act on numbers. The numeric
operators are +, -,
*, div (division of real
numbers), and mod (modulo).
10.9.3. Functions
In the previous examples we saw such XPath functions as
position( ) and not( ). XPath
defines four basic types of functions that return: booleans (true or
false), numbers (real numbers), strings (strings of characters), and
node-sets. The functions are grouped based on the datatypes they act
upon.
The following functions deal with node-sets (optional arguments are
followed by a question mark):
- last( )
-
Returns the total number of nodes of which the context node is a part
- position( )
-
Returns a number that is the position of the context node (in
document order or after sorting)
- count(node-set)
-
Returns the number of nodes contained in the specified
node-set
- id(name)
-
Returns the node with the identifier name
- local-name([node-set])
-
Returns a string that is the name (without the namespace) of the
first node in document order of the
node-set, or the context-node, if the
argument is omitted
- namespace-uri([node-set])
-
Returns a string that is the URI for the namespace of the first node
in document order of the node-set, or the
context node, if the argument is omitted
- name([node-set])
-
Returns a string that is the full name (with namespace) of the first
node in document order of the node-set, or
the context node, if the argument is omitted
The following functions deal with strings:
- string(object)
-
Converts its argument object, which can be
of any type, to a string.
- concat(str1, str2, ...)
-
Returns the concatenation of its arguments.
- starts-with(str1, str2)
-
Returns true if the first argument string
(str1) starts with the second argument
string (str2).
- contains(str1, str2)
-
Returns true if the first argument string
(str1) contains the second argument string
(str2).
- substring-before (str1, str2)
-
Returns the substring of the first argument string
(str1) that precedes the first occurrence
of the second argument string (str2).
- substring-after (str1, str2)
-
Returns the substring of the first argument string
(str1) that follows the first occurrence
of the second argument string (str2).
- substring(str, num[, length])
-
Returns the substring of the first argument
(str) starting at the position specified
by the second argument (num) with the
length specified in the third. If the
third argument is not specified, the substring continues to the end
of the string.
- string-length(str)
-
Returns the number of characters in the string.
- normalize-space(str)
-
Returns the argument string with whitespace normalized by stripping
any leading and trailing whitespace and replacing sequences of
whitespace characters by a single space.
- translate(str1, str2, str3)
-
Returns the first argument string (str1)
with occurrences of characters in the second argument string
(str2) replaced by the character at the
corresponding position in the third argument string
(str3).
The following functions deal with boolean operations:
- boolean(object)
-
Converts its argument (object), which can
be of any type, to a boolean
- not(boolean)
-
Returns true if its argument evaluates as false
- true( )
-
Returns true
- false( )
-
Returns false
- lang(str)
-
Returns true if the language of the document (or
the closest ancestor indicating the language) is the language passed
in the argument (str)
The following functions deal with numbers:
- number([obj])
-
Converts its argument (obj), which can be
of any type, to a number (using the context node if the argument is
omitted).
- sum(node-set)
-
Returns the sum of the result of converting every node in the
node-set to a number. If any node is not a
number, the function returns NaN (not a number).
- floor(num)
-
Returns the largest integer that is not greater than the argument
(num).
- ceiling(num)
-
Returns the smallest integer that is not less than the argument
(num).
- round(num)
-
Returns the integer that is closest to the argument
(num).
These functions can be used not only in XPath expressions, but in
XSLT elements as well. For example, to count the number of sections
in a text, we could add the following to a style sheet:
<xsl:text>The number of sections is </xsl:text>
<xsl:value-of select="count(//section)"/>
10.9.4. Additional XSLT Functions and Types
XSLT defines additional functionality for its own needs. One feature
is a new datatype (in addition to the four datatypes defined by
XPath): the result tree fragment. This
datatype is comparable to a node-set, except that its nodes are in a
tree rather than an unorganized collection. All the operations that
are permitted for node-sets are permitted for tree fragments.
However, you cannot use the /,
//, or [ ] operators on result
tree fragments.
XSLT also defines additional functions:
- document(obj[, node-set])
-
Returns a node-set that comprises the document whose URI (related to
the second, optional argument) was passed as the first argument
obj. If the second argument is omitted,
the context node is used.
- key(str, obj)
-
Returns the node-set of the nodes keyed by
obj in the key named
str.
- format-number(num, str1[, str2])
-
Returns a string containing the formatted value of
num, according to the format-pattern
string in str1 and the decimal-format
string in str2 (or the default decimal-
format if there is no third argument).
- current( )
-
Returns the current node.
- unparsed-entity-uri(str)
-
Returns the URI of the unparsed entity given by
str.
- generate-id(node-set)
-
Generates a unique ID for the first node in the given
node-set.
- system-property(str)
-
Returns the value of the system property passed as a string
str. The system properties are:
xsl:version (the version of XSLT implemented by
the processor), xsl:vendor (a string identifying
the vendor of the XSL processor), and
xsl:vendor-url (the vendor's
URL).
| | | 10.8. XSLT Elements | | 10.10. XPointer and XLink |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|