XPath Functions (XML in a Nutshell, 2nd Edition)

9.7. XPath Functions

XPath provides a number of functions that you may find useful in predicates or raw expressions. All of these are discussed in Chapter 22. For example, the position( ) function returns the position of the current node in the context node list as a number. This XSLT template rule uses the position( ) function to calculate the number of the person being processed, relative to other nodes in the context node list:

<xsl:template match="person">
  Person <xsl:value-of select="position( )"/>,
  <xsl:value-of select="name"/>
</xsl:template>

Each XPath function returns one of these four types:

Boolean
Number
Node-set
String

There are no void functions in XPath. Therefore, XPath is not nearly as strongly typed as languages like Java or even C. You can often use any of these types as a function argument regardless of which type the function expects, and the processor will convert it as best it can. For example, if you insert a Boolean where a string is expected, then the processor will substitute one of the two strings "true" and "false" for the Boolean. The one exception is functions that expect to receive node-sets as arguments. XPath cannot convert strings, Booleans, or numbers to node-sets.

Functions are identified by the parentheses at the end of the function names. Sometimes these functions take arguments between the parentheses. For instance, the round( ) function takes a single number as an argument. It returns the number rounded to the nearest integer. For example, <xsl:value-of select="round(3.14)"/> inserts 3 into the output tree.

Other functions take more than one argument. For instance, the starts-with( ) function takes two arguments, both strings. It returns true if the first string starts with the second string. For example, this XSLT apply-templates element selects all name elements whose last name begins with T:

<xsl:apply-templates select="name[starts-with(last_name, 'T')]"/>

In this example the first argument to the starts-with( ) function is actually a node-set, not a string. The XPath processor converts that node-set to its string value (the text content of the first element in that node-set) before checking to see whether it starts with T.

Some XSLT functions have variable-length argument lists. For instance, the concat( ) function takes as arguments any number of strings and returns one string formed by concatenating all those strings together in order. For example, concat("a", "b", "c", "d") returns "abcd".

In addition to the functions defined in XPath and discussed in this chapter, most uses of XPath, such as XSLT and XPointer, define many more functions that are useful in their particular context. You use these extra functions just like the built-in functions when you're using those applications. XSLT even lets you write extension functions in Java and other languages that can do almost anything, for example, making SQL queries against a remote database server and returning the result of the query as a node-set.

9.7.1. Node-Set Functions

The node-set functions either operate on or return information about node-sets, that is, collections of XPath nodes. You've already encountered the position( ) function. Two related functions are last( ) and count( ). The last( ) function returns the number of nodes in the context node list, which also happens to be the same as the position of the last node in the list. The count( ) function is similar except that it returns the number of nodes in its node-set argument rather than in the context node list. For example, count(//name) returns the number of name elements in the document. Example 9-5 uses the position( ) and count( ) functions to list the people in the document in the form "Person 1 of 10, Person 2 of 10, Person 3 of 10 . . . " In the second template the position( ) function determines which person element is currently being processed, and the count( ) function determines how many total person elements there are in the document.

Example 9-5. An XSLT stylesheet that uses the position( ) and count( ) functions

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="people">
    <xsl:apply-templates select="person"/>
  </xsl:template>

  <xsl:template match="person">
    Person <xsl:value-of select="position( )"/>
    of <xsl:value-of select="count(//person)"/>:
    <xsl:value-of select="name"/>
  </xsl:template>

</xsl:stylesheet>

The id( ) function takes as an argument a string containing one or more IDs separated by whitespace and returns a node-set containing all the nodes in the document that have those IDs. These are attributes declared to have type ID in the DTD not necessarily attributes named ID or id. (A DTD must be both present and processed by the parser for the id() function to work.) Thus, in Example 9-1, id('p342') indicates Alan Turing's person element; id('p342 p4567') indicates both Alan Turing and Richard Feynman's person elements.

The id( ) function is most commonly used in the abbreviated XPath syntax. It allows you to form absolute location paths that don't start from the root. For example, id('p342')/name refers to Alan Turing's name element, regardless of where Alan Turing's person element is in the document, as long as it hasn't changed ID. This function is especially useful for XPointers where it takes the place of HTML's named anchors.

Finally, there are three node-set functions related to namespaces. The local-name( ) function takes as an argument a node-set and returns the local part of the first node in that set. The namespace-uri( ) function takes a node-set as an argument and returns the namespace URI of the first node in the set. Finally, the name( ) function takes a node-set as an argument and returns the prefixed name of the first node in that set. In all three functions the argument may be omitted, in which case the context node's namespace is evaluated. For instance, when applied to Example 9-1 the XPath expression, local-name(//homepage/@xlink:href) is href; namespace-uri(//homepage/@xlink:href) is http://www.w3.org/1999/xlink; and name(//homepage/@xlink:href) is xlink:href.

9.7.2. String Functions

XPath includes functions for basic string operations such as finding the length of a string or changing letters from upper- to lowercase. It doesn't have the full power of the string libraries in Python or Perl--for instance, there's no regular expression support--but it's sufficient for many simple manipulations you need for XSLT or XPointer.

The string( ) function converts an argument of any type to a string in a reasonable fashion. Booleans are converted to the string "true" or the string "false." Node-sets are converted to the string value of the first node in the set. This is the same value calculated by the xsl:value-of element. That is, the string value of the element is the complete text of the element after all entity references are resolved and tags, comments, and processing instructions have been stripped out. Numbers are converted to strings in the format used by most programming languages, such as "1987," "299792500," or "2.71828."

TIP: In XSLT the xsl:decimal-format element and format-number( ) function provide more precise control over formatting so you can insert separators between groups, change the decimal separator, use non-European digits, and make similar adjustments.

The normal use of most of the rest of the string functions is to manipulate or address the text content of XML elements or attributes. For instance, if date attributes were given in the format MM/DD/YYYY, then the string functions would allow you to target the month, day, and year separately.

The starts-with( ) function takes two string arguments. It returns true if the first argument starts with the second argument. For example, starts-with('Richard', 'Ric') is true but starts-with('Richard', 'Rick') is false. There is no corresponding ends-with( ) function.

The contains( ) function also takes two string arguments. However, it returns true if the first argument contains the second argument--that is, if the second argument is a substring of the first argument--regardless of position. For example, contains('Richard', 'ar') is true but contains('Richard', 'art') is false.

The substring-before( ) function takes two string arguments and returns the substring of the first argument that precedes the initial appearance of the second argument. If the second string doesn't appear in the first string, then substring-before( ) returns the empty string. For example, substring-before('MM/DD/YYYY', '/') is MM. The substring-after( ) function also takes two string arguments but returns the substring of the first argument that follows the initial appearance of the second argument. If the second string doesn't appear in the first string, then substring-after( ) returns the empty string. For example, substring-after ('MM/DD/YYYY', '/') is 'DD/YYYY'. substring-before(substring-after('MM/DD/YYYY', '/')', '/') is DD. substring-after(substring-after('MM/DD/YYYY', '/')', '/') is YYYY.

If you know the position of the substring you want, then you can use the substring( ) method instead. This takes three arguments: the string from which the substring will be copied, the position in the string from which to start extracting, and the number of characters to copy to the substring. The third argument may be omitted, in which case the substring contains all characters from the specified start position to the end of the string. For example, substring('MM/DD/YYYY', 1, 2) is MM; substring('MM/DD/YYYY', 4, 2) is DD; and substring('MM/DD/YYYY', 7) is YYYY.

The string-length( ) function returns a number giving the length of its argument's string value or the context node if no argument is included. In Example 9-1, string-length(//name[position( )=1]) is 29. If that seems long to you, remember that all whitespace characters are included in the count. If it seems short to you, remember that markup characters are not included in the count.

Theoretically, you could use these functions to trim and normalize whitespace in element content. However, since this would be relatively complex and is such a common need, XPath provides the normalize-space( ) function to do this. For instance, in Example 9-1 the value of string(//name[position( )=1]) is:

Alan
Turing

This contains a lot of extra whitespace that was inserted purely to make the XML document neater. However, normalize-space(string(//name[position( )=1])) is the much more reasonable:

Alan Turing

Although a more powerful string-manipulation library would be useful, XSLT is really designed for transforming the element structure of an XML document. It's not meant to have the more general power of a language like Perl, which can handle arbitrarily complicated and varying string formats.

9.7.3. Boolean Functions

The Boolean functions are few in number and quite straightforward. They all return a Boolean that has the value true or false. The true( ) function always returns true. The false( ) function always returns false. These substitute for Boolean literals in XPath.

The not( ) function reverses the sense of its Boolean argument. For example, not(@id>400) is almost always equivalent to (@id<=400). (NaN is a special case.)

The boolean( ) function converts its single argument to a Boolean and returns the result. If the argument is omitted, then it converts the context node. Numbers are converted to false if they're zero or NaN. All other numbers are true. Node-sets are false if they're empty, true if they have at least one element. Strings are false if they have zero length, otherwise they're true. Note that according to this rule, the string "false" is in fact true.

9.7.4. Number Functions

XPath includes a few simple numeric functions for summing groups of numbers and finding the nearest integer to a number. It doesn't have the full power of the math libraries in Java or Fortran--for instance, there's no square root or exponentiation function--but it's got enough to do most of the basic math you need for XSLT or the even simpler requirements of XPointer.

The number( ) function can take any type as an argument and convert it to a number. If the argument is omitted, then it converts the context node. Booleans are converted to 1 if true and if false. Strings are converted in a plausible fashion. For instance the string "7.5" will be converted to the number 7.5. The string "Fred" will be converted to NaN. Node-sets are converted to numbers by first converting them to their string values and then converting the resulting string to a number. The detailed rules are a little more complex, but as long as the object you're converting can reasonably be interpreted as a single number, chances are the number( ) function will do what you expect. If the object you're converting can't be reasonably interpreted as a single number, then the number( ) function will return NaN.

The round( ), floor( ), and ceiling( ) functions all take a single number as an argument. The floor( ) function returns the greatest integer less than or equal to its argument. The ceiling( ) function returns the smallest integer greater than or equal to its argument. The round( ) function returns its argument rounded to the nearest integer. When rounding numbers like 1.5 and -3.5 that are equally close to two integers, round() returns the greater of the two possibilities. (This means that -1.5 rounds to -1, but 1.5 rounds to 2.)

The sum( ) function takes a node-set as an argument. It converts each node in the set to its string value, then converts each of those strings to a number. It then adds up the numbers and returns the result.