<?xml version="1.0"?>
<wordlist>
<word>campo</word>
<word>luna</word>
<word>ciudad</word>
<word>llaves</word>
<word>chihuahua</word>
<word>arroz</word>
<word>limonada</word>
</wordlist>
This document contains Spanish words that are sorted differently than they would be in English. (In Spanish, "ch" and "ll" are separate letters that sort after "c" and "l," respectively.) We'll write a stylesheet that uses three <xsl:template>s to illustrate how our extension function works. Here's the stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="newline">
<xsl:text>
</xsl:text>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$newline"/>
<xsl:apply-templates select="wordlist" mode="unsorted"/>
<xsl:apply-templates select="wordlist" mode="default"/>
<xsl:apply-templates select="wordlist" mode="Spanish"/>
</xsl:template>
<xsl:template match="wordlist" mode="unsorted">
<xsl:text>Word list - unsorted:</xsl:text>
<xsl:value-of select="$newline"/>
<xsl:for-each select="word">
<xsl:value-of select="."/>
<xsl:value-of select="$newline"/>
</xsl:for-each>
<xsl:value-of select="$newline"/>
</xsl:template>
<xsl:template match="wordlist" mode="default">
<xsl:text>Word list - sorted with default rules:</xsl:text>
<xsl:value-of select="$newline"/>
<xsl:for-each select="word">
<xsl:sort select="."/>
<xsl:value-of select="."/>
<xsl:value-of select="$newline"/>
</xsl:for-each>
<xsl:value-of select="$newline"/>
</xsl:template>
<xsl:template match="wordlist" mode="Spanish">
<xsl:text>Word list - sorted with Spanish rules:</xsl:text>
<xsl:value-of select="$newline"/>
<xsl:for-each select="word">
<xsl:sort select="." lang="es"/>
<xsl:value-of select="."/>
<xsl:value-of select="$newline"/>
</xsl:for-each>
<xsl:value-of select="$newline"/>
</xsl:template>
</xsl:stylesheet>
When we run the stylesheet against our document, it invokes the three templates with three different modes. One template simply lists the <word> elements as they appear in the original document, the second sorts the <word> elements using the default sorting sequence, and the third sorts the <word> elements using the traditional rules of Spanish sorting. Refreshingly enough, the code that implements the sorting function is simple. Here's the entire listing:
package com.icl.saxon.sort;
import java.text.ParseException;
import java.text.RuleBasedCollator;
import java.util.Locale;
public class Compare_es extends TextComparer
{
private static String smallnTilde = new String("\u00F1");
private static String capitalNTilde = new String("\u00D1");
private static String traditionalSpanishRules =
("< a,A < b,B < c,C " +
"< ch, cH, Ch, CH " +
"< d,D < e,E < f,F " +
"< g,G < h,H < i,I < j,J < k,K < l,L " +
"< ll, lL, Ll, LL " +
"< m,M < n,N " +
"< " + smallnTilde + "," + capitalNTilde + " " +
"< o,O < p,P < q,Q < r,R " +
"< s,S < t,T < u,U < v,V < w,W < x,X " +
"< y,Y < z,Z");
private static RuleBasedCollator rbc = null;
static
{
try
{
rbc = new RuleBasedCollator(traditionalSpanishRules);
}
catch (ParseException pe)
{
System.err.println("Error creating RuleBasedCollator: " + rbc);
}
}
public int compare(Object a, Object b)
{
if (rbc != null)
return rbc.compare((String)a, (String)b);
else
return 0;
}
}
(See the documentation for the java.text.RuleBasedCollator class for an explanation of the traditionalSpanishRules string.)
When Saxon sees an <xsl:sort> element with a lang attribute of es, it attempts to load a Java class named com.icl.saxon.sort.Compare_es. If that class can be loaded, Saxon calls that class's compare method as it sorts the <word> elements. When we run the stylesheet against our earlier example document, here are the results:
Word list - unsorted:
campo
luna
ciudad
llaves
chihuahua
arroz
limonada
Word list - sorted with default rules:
arroz
campo
chihuahua
ciudad
limonada
llaves
luna
Word list - sorted with Spanish rules:
arroz
campo
ciudad
chihuahua
limonada
luna
llaves
In the output, our Spanish sorting routine puts chihuahua after ciudad, and llaves after luna. With less than 20 lines of code, we've been able to add a new sorting function to our stylesheet. Most of the work is done for us by the Saxon processor and the methods of the java.text.RuleBasedCollator class.