Book HomeXSLTSearch this book

Chapter 6. Sorting and Grouping Elements

By now, I hope you're convinced that you can use XSLT to convert big piles of XML data into other useful things. Our examples to this point have pretty much gone through the XML source in what's referred to as document order. We'd like to go through our XML documents in a couple of other common ways, though:

  • We could sort some or all of the XML elements, then generate output based on the sorted elements.

  • We could group the data, selecting all elements that have some property in common, then sorting the groups of elements.

We'll give several examples of these operations in this chapter.

6.1. Sorting Data with <xsl:sort>

The simplest way to rearrange our XML elements is to use the <xsl:sort> element. This element temporarily rearranges a collection of elements based on criteria we define in our stylesheet.

6.1.1. Our First Example

For our first example, we'll have a set of U.S. postal addresses that we want to sort. (No chauvinism is intended here; obviously every country has different conventions for mailing addresses. We just needed a short sample document that can be sorted in many useful ways.) Here's our original document:

<?xml version="1.0"?>
<addressbook>
  <address>
    <name>
      <title>Mr.</title>
      <first-name>Chester Hasbrouck</first-name>
      <last-name>Frisby</last-name>
    </name>
    <street>1234 Main Street</street>
    <city>Sheboygan</city>
    <state>WI</state>
    <zip>48392</zip>
  </address>
  <address>
    <name>
      <first-name>Mary</first-name>
      <last-name>Backstayge</last-name>
    </name>
    <street>283 First Avenue</street>
    <city>Skunk Haven</city>
    <state>MA</state>
    <zip>02718</zip>
  </address>
  <address>
    <name>
      <title>Ms.</title>
      <first-name>Natalie</first-name>
      <last-name>Attired</last-name>
    </name>
    <street>707 Breitling Way</street>
    <city>Winter Harbor</city>
    <state>ME</state>
    <zip>00218</zip>
  </address>
  <address>
    <name>
      <first-name>Harry</first-name>
      <last-name>Backstayge</last-name>
    </name>
    <street>283 First Avenue</street>
    <city>Skunk Haven</city>
    <state>MA</state>
    <zip>02718</zip>
  </address>
  <address>
    <name>
      <first-name>Mary</first-name>
      <last-name>McGoon</last-name>
    </name>
    <street>103 Bryant Street</street>
    <city>Boylston</city>
    <state>VA</state>
    <zip>27318</zip>
  </address>
  <address>
    <name>

      <title>Ms.</title>
      <first-name>Amanda</first-name>
      <last-name>Reckonwith</last-name>
    </name>
    <street>930-A Chestnut Street</street>
    <city>Lynn</city>
    <state>MA</state>
    <zip>02930</zip>
  </address>
</addressbook>

We'd like to generate a list of these addresses, sorted by <last-name>. We'll use the magical <xsl:sort> element to do the work. Our stylesheet looks like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" indent="no"/>
  <xsl:strip-space elements="*"/>

  <xsl:variable name="newline">
<xsl:text>
</xsl:text>
  </xsl:variable>

  <xsl:template match="/">
    <xsl:for-each select="addressbook/address">
      <xsl:sort select="name/last-name"/>
      <xsl:value-of select="name/title"/>
      <xsl:text> </xsl:text>
      <xsl:value-of select="name/first-name"/>
      <xsl:text> </xsl:text>
      <xsl:value-of select="name/last-name"/>
      <xsl:value-of select="$newline"/>
      <xsl:value-of select="street"/>
      <xsl:value-of select="$newline"/>
      <xsl:value-of select="city"/>
      <xsl:text>, </xsl:text>
      <xsl:value-of select="state"/>
      <xsl:text>  </xsl:text>
      <xsl:value-of select="zip"/>
      <xsl:value-of select="$newline"/>
      <xsl:value-of select="$newline"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

The heart of our stylesheet are the <xsl:for-each> and <xsl:sort> elements. The <xsl:for-each> element selects the items with which we'll work, and the <xsl:sort> element rearranges them before we write them out.

Notice that we're generating a text file (<xsl:output method="text"/>). (You could generate an HTML file or something more complicated if you want.) To invoke the stylesheet engine, we run this command:

java org.apache.xalan.xslt.Process -in names.xml -xsl namesorter1.xsl 
  -out names.text

Here are the results we get from our first attempt at sorting:

Ms. Natalie Attired
707 Breitling Way
Winter Harbor, ME  00218

 Mary Backstayge
283 First Avenue
Skunk Haven, MA  02718

 Harry Backstayge
283 First Avenue
Skunk Haven, MA  02718

Mr. Chester Hasbrouck Frisby
1234 Main Street
Sheboygan, WI  48392

 Mary McGoon
103 Bryant Street
Boylston, VA  27318

Ms. Amanda Reckonwith
930-A Chestnut Street
Lynn, MA  02930

As you can see from the output, the addresses in our original document were sorted by last name. All we had to do was add xsl:sort to our stylesheet, and all the elements were magically reordered. If you aren't convinced that XSLT can increase your programmer productivity, try writing the Java code and DOM method calls to do the same thing.

We can do a couple of things to improve our original stylesheet, however. For one thing, there's an annoying blank space at the start of every name that doesn't have a <title> element. A more significant improvement is that we'd like to sort addresses by <first-name> within <last-name>. In our last example, Mary Backstayge should appear after Harry Backstayge. Here's how we can modify our stylesheet to use more than one sort key:

<xsl:template match="/">
  <xsl:for-each select="addressbook/address">
    <xsl:sort select="name/last-name"/>
    <xsl:sort select="name/first-name"/>
    ...

We've simply added a second <xsl:sort> element to our stylesheet. This element does what we want; it sorts the <address> elements by <first-name> within <last-name>. To be thoroughly obsessive about our output, we can use an <xsl:if> element to get rid of that annoying blank space in front of names with no <title> element:

<xsl:if test="name/title">
  <xsl:value-of select="name/title"/>
  <xsl:text> </xsl:text>
</xsl:if>

Now our output is perfect:

Ms. Natalie Attired
707 Breitling Way
Winter Harbor, ME  00218

Harry Backstayge
283 First Avenue
Skunk Haven, MA  02718

Mary Backstayge
283 First Avenue
Skunk Haven, MA  02718

Mr. Chester Hasbrouck Frisby
1234 Main Street
Sheboygan, WI  48392

Mary McGoon
103 Bryant Street
Boylston, VA  27318

Ms. Amanda Reckonwith
930-A Chestnut Street
Lynn, MA  02930

6.1.2. The Details on the <xsl:sort> Element

Now that we've seen a couple of examples of how <xsl:sort> works, we'll go over its syntax, its attributes, and where you can use it.

6.1.2.1. What's the deal with that syntax?

I'm so glad you asked that question. One thing the XSLT working group could have done is something like this:

<xsl:for-each select="addressbook/address" sort-key-1="name/last-name" 
  sort-key-2="name/first-name"/>

The problem with this approach is that no matter how many sort-key-x attributes you define, out of sheer perverseness, someone will cry out that they really need the sort-key-8293 attribute. To avoid this messy problem, the XSLT designers decided to let you specify the sort keys by using a number of <xsl:sort> elements. The first is the primary sort key, the second is the secondary sort key, the 8293rd one is the eight-thousand-two-hundred-and-ninety-third sort key, etc.

Well, that's why the syntax looks the way it does, but how does it actually work? When I first saw this syntax:

<xsl:for-each select="addressbook/address">
  <xsl:sort select="name/last-name"/>
  <xsl:sort select="name/first-name"/>
  <xsl:apply-templates select="."/>
</xsl:for-each>

I thought it meant that all the nodes were sorted during each iteration through the <xsl:for-each> element. That seemed incredibly inefficient; if you've sorted all the nodes, why resort them each time through the <xsl:for-each> element? Actually, the XSLT processor handles all <xsl:sort> elements before it does anything, then it processes the <xsl:for-each> element as if the <xsl:sort> elements weren't there.

It's less efficient, but if it makes you feel better about the syntax, you could write the stylesheet like this:

<xsl:template match="/">
  <xsl:for-each select="addressbook/address">
    <xsl:sort select="name/last-name"/>
    <xsl:sort select="name/first-name"/>
    <xsl:for-each select=".">  <!-- This is slower, but it works -->
      <xsl:apply-templates select="."/>
    </xsl:for-each>
  </xsl:for-each>
</xsl:template>

(Don't actually do this. I'm only trying to make a point.) This stylesheet generates the same results as our earlier stylesheet.

6.1.2.2. Attributes

The <xsl:sort> element has several attributes, all of which are discussed here.

select
The select attribute defines the characteristic we'll use for sorting. Its contents are an XPath expression, so you can select elements, text, attributes, comments, ancestors, etc. As always, the XPath expression defined in select is evaluated in terms of the current context.

data-type
The data-type attribute can have three values:

  • data-type="text"

  • data-type="number"

  • A data-type="QName" that identifies a particular datatype. The stated goal of the XSLT working group is that the datatypes defined in the XML Schema specification will eventually be supported here.

The XSLT specification defines the behavior for data-type="text" and data-type="number". Consider this XML document:

<?xml version="1.0"?>
<numberlist>
  <number>127</number>
  <number>23</number>
  <number>10</number>
</numberlist>

We'll sort these values using the default value (data-type="text"):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" indent="no"/>
  <xsl:strip-space elements="*"/>

  <xsl:variable name="newline">
<xsl:text>
</xsl:text>
  </xsl:variable>

  <xsl:template match="/">
    <xsl:for-each select="numberlist/number">
      <xsl:sort select="."/>
      <xsl:value-of select="."/>
      <xsl:value-of select="$newline"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

When we sort these elements using data-type="text", here's what we get:

10
127
23

We get this result because a text-based sort puts anything that starts with a "1" before anything that starts with a "2." If we change the <xsl:sort> element to be <xsl:sort select="." data-type="number"/>, we get these results:

10
27
123

If you use something else here (data-type="floating-point", for example), what the XSLT processor does is anybody's guess. The XSLT specification allows for other values here, but it's up to the XSLT processor to decide how (or if) it wants to process those values. Check your processor's documentation to see if it does anything relevant or useful for values other than data-type="text" or data-type="number".

A final note: if you're using data-type="number", and any of the values aren't numbers, those non-numeric values will sort before the numeric values. That means if you're using order="ascending", the non-numeric values appear first; if you use order="descending", the non-numeric values appear last.

<?xml version="1.0"?>
<numberlist>
  <number>127</number>
  <number>23</number>
  <number>zzz</number>
  <number>10</number>
  <number>yyy</number>
</numberlist>

Given this less-than-perfect data, here are the correctly sorted results:

zzz
yyy
10
23
127

Notice that the non-numeric values were not sorted; they simply appear in the output document in the order in which they were encountered.

order
You can order the sort as order="ascending" or order="descending". The default is order="ascending".

case-order
This attribute can have two values. case-order="upper-first" means that uppercase letters sort before lowercase letters, and case-order="lower-first" means that lowercase letters sort first. The case-order attribute is used only when the data-type attribute is text. The default value depends on the value of the soon-to-be-discussed lang attribute.

lang
This attribute defines the language of the sort keys. The valid values for this attribute are the same as those for the xml:lang attribute defined in Section 2.12 of the XML 1.0 specification. The language codes are those commonly used in Java programming, UNIX locales, and other places ISO language and country namings are defined. For example, lang="en" means "English," lang="en-US" means "U.S. English," and lang="en-GB" means "U.K. English." Without the lang attribute (it's rarely used in practice), the XSLT processor determines the default language from the system environment.

6.1.2.3. Where can you use <xsl:sort>?

The <xsl:sort> element can appear inside two elements:

  • <xsl:apply-templates>

  • <xsl:for-each>

If you use an <xsl:sort> element inside <xsl:for-each>, the <xsl:sort> element(s) must appear first. If you tried something like this, you'd get an exception from the XSLT processor:

<xsl:for-each select="addressbook/address">
  <xsl:sort select="name/last-name"/>
  <xsl:value-of select="name/title"/> 
  <xsl:sort select="name/first-name"/> <!-- NOT LEGAL! -->
  ...

6.1.3. Another Example

We've pretty much covered the <xsl:sort> element at this point. To add another wrinkle to our example, we'll change the stylesheet so the xsl:sort element acts upon a subset of the addresses, then sorts that subset. We'll sort only the addresses from states that start with the letter M. As you'd expect, we'll do this magic with an XPath expression that limits the elements to be sorted:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text" indent="no"/>
  <xsl:strip-space elements="*"/>
  <xsl:variable name="newline">
<xsl:text>
</xsl:text>
  </xsl:variable>

  <xsl:template match="/">
    <xsl:for-each select="addressbook/address/[starts-with(state, 'M')]">
      <xsl:sort select="name/last-name"/>
      <xsl:sort select="name/first-name"/>
      <xsl:if test="name/title">
        <xsl:value-of select="name/title"/>
        <xsl:text> </xsl:text>
      </xsl:if>
      <xsl:value-of select="name/first-name"/>
      <xsl:text> </xsl:text>
      <xsl:value-of select="name/last-name"/>
      <xsl:value-of select="$newline"/>
      <xsl:value-of select="street"/>
      <xsl:value-of select="$newline"/>
      <xsl:value-of select="city"/>
      <xsl:text>, </xsl:text>
      <xsl:value-of select="state"/>
      <xsl:text>  </xsl:text>
      <xsl:value-of select="zip"/>
      <xsl:value-of select="$newline"/>
      <xsl:value-of select="$newline"/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

Here are the results, only those addresses from states beginning with the letter M, sorted by first name within last name:

Ms. Natalie Attired
707 Breitling Way
Winter Harbor, ME  00218

Harry Backstayge
283 First Avenue
Skunk Haven, MA  02718

Mary Backstayge
283 First Avenue
Skunk Haven, MA  02718

Ms. Amanda Reckonwith
930-A Chestnut Street
Lynn, MA  02930

Notice that in the xsl:for-each element, we used a predicate in our XPath expression so that only addresses containing <state> elements whose contents begin with M are selected. This example starts us on the path to grouping nodes. We could do lots of other things here:

  • We could generate output that prints all the unique Zip Codes, along with the number of addresses that have those Zip Codes.

  • For each unique Zip Code (or state, or last name, etc.) we could sort on a field and list all addresses with that Zip Code.

We'll discuss these topics in the next section.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.