Chapter 6. Sorting and Grouping ElementsBy now, I hope you're convinced that you can use XSLT to convert big piles of XML data into other useful things. Our examples to this point have pretty much gone through the XML source in what's referred to as document order. We'd like to go through our XML documents in a couple of other common ways, though:
We'll give several examples of these operations in this chapter. 6.1. Sorting Data with <xsl:sort>The simplest way to rearrange our XML elements is to use the <xsl:sort> element. This element temporarily rearranges a collection of elements based on criteria we define in our stylesheet. 6.1.1. Our First ExampleFor our first example, we'll have a set of U.S. postal addresses that we want to sort. (No chauvinism is intended here; obviously every country has different conventions for mailing addresses. We just needed a short sample document that can be sorted in many useful ways.) Here's our original document: <?xml version="1.0"?> <addressbook> <address> <name> <title>Mr.</title> <first-name>Chester Hasbrouck</first-name> <last-name>Frisby</last-name> </name> <street>1234 Main Street</street> <city>Sheboygan</city> <state>WI</state> <zip>48392</zip> </address> <address> <name> <first-name>Mary</first-name> <last-name>Backstayge</last-name> </name> <street>283 First Avenue</street> <city>Skunk Haven</city> <state>MA</state> <zip>02718</zip> </address> <address> <name> <title>Ms.</title> <first-name>Natalie</first-name> <last-name>Attired</last-name> </name> <street>707 Breitling Way</street> <city>Winter Harbor</city> <state>ME</state> <zip>00218</zip> </address> <address> <name> <first-name>Harry</first-name> <last-name>Backstayge</last-name> </name> <street>283 First Avenue</street> <city>Skunk Haven</city> <state>MA</state> <zip>02718</zip> </address> <address> <name> <first-name>Mary</first-name> <last-name>McGoon</last-name> </name> <street>103 Bryant Street</street> <city>Boylston</city> <state>VA</state> <zip>27318</zip> </address> <address> <name> <title>Ms.</title> <first-name>Amanda</first-name> <last-name>Reckonwith</last-name> </name> <street>930-A Chestnut Street</street> <city>Lynn</city> <state>MA</state> <zip>02930</zip> </address> </addressbook> We'd like to generate a list of these addresses, sorted by <last-name>. We'll use the magical <xsl:sort> element to do the work. Our stylesheet looks like this: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" indent="no"/> <xsl:strip-space elements="*"/> <xsl:variable name="newline"> <xsl:text> </xsl:text> </xsl:variable> <xsl:template match="/"> <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:value-of select="name/title"/> <xsl:text> </xsl:text> <xsl:value-of select="name/first-name"/> <xsl:text> </xsl:text> <xsl:value-of select="name/last-name"/> <xsl:value-of select="$newline"/> <xsl:value-of select="street"/> <xsl:value-of select="$newline"/> <xsl:value-of select="city"/> <xsl:text>, </xsl:text> <xsl:value-of select="state"/> <xsl:text> </xsl:text> <xsl:value-of select="zip"/> <xsl:value-of select="$newline"/> <xsl:value-of select="$newline"/> </xsl:for-each> </xsl:template> </xsl:stylesheet> The heart of our stylesheet are the <xsl:for-each> and <xsl:sort> elements. The <xsl:for-each> element selects the items with which we'll work, and the <xsl:sort> element rearranges them before we write them out. Notice that we're generating a text file (<xsl:output method="text"/>). (You could generate an HTML file or something more complicated if you want.) To invoke the stylesheet engine, we run this command: java org.apache.xalan.xslt.Process -in names.xml -xsl namesorter1.xsl -out names.text Here are the results we get from our first attempt at sorting: Ms. Natalie Attired 707 Breitling Way Winter Harbor, ME 00218 Mary Backstayge 283 First Avenue Skunk Haven, MA 02718 Harry Backstayge 283 First Avenue Skunk Haven, MA 02718 Mr. Chester Hasbrouck Frisby 1234 Main Street Sheboygan, WI 48392 Mary McGoon 103 Bryant Street Boylston, VA 27318 Ms. Amanda Reckonwith 930-A Chestnut Street Lynn, MA 02930 As you can see from the output, the addresses in our original document were sorted by last name. All we had to do was add xsl:sort to our stylesheet, and all the elements were magically reordered. If you aren't convinced that XSLT can increase your programmer productivity, try writing the Java code and DOM method calls to do the same thing. We can do a couple of things to improve our original stylesheet, however. For one thing, there's an annoying blank space at the start of every name that doesn't have a <title> element. A more significant improvement is that we'd like to sort addresses by <first-name> within <last-name>. In our last example, Mary Backstayge should appear after Harry Backstayge. Here's how we can modify our stylesheet to use more than one sort key: <xsl:template match="/"> <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> ... We've simply added a second <xsl:sort> element to our stylesheet. This element does what we want; it sorts the <address> elements by <first-name> within <last-name>. To be thoroughly obsessive about our output, we can use an <xsl:if> element to get rid of that annoying blank space in front of names with no <title> element: <xsl:if test="name/title"> <xsl:value-of select="name/title"/> <xsl:text> </xsl:text> </xsl:if> Now our output is perfect: Ms. Natalie Attired 707 Breitling Way Winter Harbor, ME 00218 Harry Backstayge 283 First Avenue Skunk Haven, MA 02718 Mary Backstayge 283 First Avenue Skunk Haven, MA 02718 Mr. Chester Hasbrouck Frisby 1234 Main Street Sheboygan, WI 48392 Mary McGoon 103 Bryant Street Boylston, VA 27318 Ms. Amanda Reckonwith 930-A Chestnut Street Lynn, MA 02930 6.1.2. The Details on the <xsl:sort> ElementNow that we've seen a couple of examples of how <xsl:sort> works, we'll go over its syntax, its attributes, and where you can use it. 6.1.2.1. What's the deal with that syntax?I'm so glad you asked that question. One thing the XSLT working group could have done is something like this: <xsl:for-each select="addressbook/address" sort-key-1="name/last-name" sort-key-2="name/first-name"/> The problem with this approach is that no matter how many sort-key-x attributes you define, out of sheer perverseness, someone will cry out that they really need the sort-key-8293 attribute. To avoid this messy problem, the XSLT designers decided to let you specify the sort keys by using a number of <xsl:sort> elements. The first is the primary sort key, the second is the secondary sort key, the 8293rd one is the eight-thousand-two-hundred-and-ninety-third sort key, etc. Well, that's why the syntax looks the way it does, but how does it actually work? When I first saw this syntax: <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> <xsl:apply-templates select="."/> </xsl:for-each> I thought it meant that all the nodes were sorted during each iteration through the <xsl:for-each> element. That seemed incredibly inefficient; if you've sorted all the nodes, why resort them each time through the <xsl:for-each> element? Actually, the XSLT processor handles all <xsl:sort> elements before it does anything, then it processes the <xsl:for-each> element as if the <xsl:sort> elements weren't there. It's less efficient, but if it makes you feel better about the syntax, you could write the stylesheet like this: <xsl:template match="/"> <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> <xsl:for-each select="."> <!-- This is slower, but it works --> <xsl:apply-templates select="."/> </xsl:for-each> </xsl:for-each> </xsl:template> (Don't actually do this. I'm only trying to make a point.) This stylesheet generates the same results as our earlier stylesheet. 6.1.2.2. AttributesThe <xsl:sort> element has several attributes, all of which are discussed here.
6.1.2.3. Where can you use <xsl:sort>?The <xsl:sort> element can appear inside two elements:
If you use an <xsl:sort> element inside <xsl:for-each>, the <xsl:sort> element(s) must appear first. If you tried something like this, you'd get an exception from the XSLT processor: <xsl:for-each select="addressbook/address"> <xsl:sort select="name/last-name"/> <xsl:value-of select="name/title"/> <xsl:sort select="name/first-name"/> <!-- NOT LEGAL! --> ... 6.1.3. Another ExampleWe've pretty much covered the <xsl:sort> element at this point. To add another wrinkle to our example, we'll change the stylesheet so the xsl:sort element acts upon a subset of the addresses, then sorts that subset. We'll sort only the addresses from states that start with the letter M. As you'd expect, we'll do this magic with an XPath expression that limits the elements to be sorted: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" indent="no"/> <xsl:strip-space elements="*"/> <xsl:variable name="newline"> <xsl:text> </xsl:text> </xsl:variable> <xsl:template match="/"> <xsl:for-each select="addressbook/address/[starts-with(state, 'M')]"> <xsl:sort select="name/last-name"/> <xsl:sort select="name/first-name"/> <xsl:if test="name/title"> <xsl:value-of select="name/title"/> <xsl:text> </xsl:text> </xsl:if> <xsl:value-of select="name/first-name"/> <xsl:text> </xsl:text> <xsl:value-of select="name/last-name"/> <xsl:value-of select="$newline"/> <xsl:value-of select="street"/> <xsl:value-of select="$newline"/> <xsl:value-of select="city"/> <xsl:text>, </xsl:text> <xsl:value-of select="state"/> <xsl:text> </xsl:text> <xsl:value-of select="zip"/> <xsl:value-of select="$newline"/> <xsl:value-of select="$newline"/> </xsl:for-each> </xsl:template> </xsl:stylesheet> Here are the results, only those addresses from states beginning with the letter M, sorted by first name within last name: Ms. Natalie Attired 707 Breitling Way Winter Harbor, ME 00218 Harry Backstayge 283 First Avenue Skunk Haven, MA 02718 Mary Backstayge 283 First Avenue Skunk Haven, MA 02718 Ms. Amanda Reckonwith 930-A Chestnut Street Lynn, MA 02930 Notice that in the xsl:for-each element, we used a predicate in our XPath expression so that only addresses containing <state> elements whose contents begin with M are selected. This example starts us on the path to grouping nodes. We could do lots of other things here:
Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|