Extending the Saxon Processor (XSLT)

8.2. Extending the Saxon Processor

Michael Kay's excellent Saxon processor also provides an extension mechanism. One of the nice features of Saxon's extensibility mechanism is that you can implement your own sort functions. When we discussed the <xsl:sort> element a couple of chapters ago, we mentioned that it has a lang attribute that defines the language of the things being sorted. While Xalan doesn't currently support this attribute (although by the time you're reading this, it might), Saxon lets you create your own extension function to handle the sorting. Your extension function must extend the com.icl.saxon.sort.TextComparer class. Here's a sample XML document we'll use to illustrate this function:

<?xml version="1.0"?>
<wordlist>
  <word>campo</word>
  <word>luna</word>
  <word>ciudad</word>
  <word>llaves</word>
  <word>chihuahua</word>
  <word>arroz</word>
  <word>limonada</word>
</wordlist>

This document contains Spanish words that are sorted differently than they would be in English. (In Spanish, "ch" and "ll" are separate letters that sort after "c" and "l," respectively.) We'll write a stylesheet that uses three <xsl:template>s to illustrate how our extension function works. Here's the stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" indent="no"/>
  <xsl:strip-space elements="*"/>

  <xsl:variable name="newline">
<xsl:text>
</xsl:text>
  </xsl:variable>

  <xsl:template match="/">
    <xsl:value-of select="$newline"/>
    <xsl:apply-templates select="wordlist" mode="unsorted"/>
    <xsl:apply-templates select="wordlist" mode="default"/>
    <xsl:apply-templates select="wordlist" mode="Spanish"/>
  </xsl:template>

  <xsl:template match="wordlist" mode="unsorted">
    <xsl:text>Word list - unsorted:</xsl:text>
    <xsl:value-of select="$newline"/>
    <xsl:for-each select="word">
      <xsl:value-of select="."/>
      <xsl:value-of select="$newline"/>
    </xsl:for-each>
    <xsl:value-of select="$newline"/>
  </xsl:template>


  <xsl:template match="wordlist" mode="default">
    <xsl:text>Word list - sorted with default rules:</xsl:text>
    <xsl:value-of select="$newline"/>
    <xsl:for-each select="word">
      <xsl:sort select="."/>
      <xsl:value-of select="."/>
      <xsl:value-of select="$newline"/>
    </xsl:for-each>
    <xsl:value-of select="$newline"/>
  </xsl:template>

  <xsl:template match="wordlist" mode="Spanish">
    <xsl:text>Word list - sorted with Spanish rules:</xsl:text>
    <xsl:value-of select="$newline"/>
    <xsl:for-each select="word">
      <xsl:sort select="." lang="es"/>
      <xsl:value-of select="."/>
      <xsl:value-of select="$newline"/>
    </xsl:for-each>
    <xsl:value-of select="$newline"/>
  </xsl:template>
</xsl:stylesheet>

When we run the stylesheet against our document, it invokes the three templates with three different modes. One template simply lists the <word> elements as they appear in the original document, the second sorts the <word> elements using the default sorting sequence, and the third sorts the <word> elements using the traditional rules of Spanish sorting. Refreshingly enough, the code that implements the sorting function is simple. Here's the entire listing:

package com.icl.saxon.sort;

import java.text.ParseException;
import java.text.RuleBasedCollator;
import java.util.Locale;

public class Compare_es extends TextComparer
{
  private static String smallnTilde  = new String("\u00F1");
  private static String capitalNTilde = new String("\u00D1");
  
  private static String traditionalSpanishRules = 
    ("< a,A < b,B < c,C " +
     "< ch, cH, Ch, CH "  +
     "< d,D < e,E < f,F " +
     "< g,G < h,H < i,I < j,J < k,K < l,L " +
     "< ll, lL, Ll, LL "  +
     "< m,M < n,N " +
     "< " + smallnTilde + "," + capitalNTilde + " " +
     "< o,O < p,P < q,Q < r,R " +
     "< s,S < t,T < u,U < v,V < w,W < x,X " +
     "< y,Y < z,Z");

  private static RuleBasedCollator rbc = null;

  static 
  {
    try
    {
      rbc = new RuleBasedCollator(traditionalSpanishRules);
    }
    catch (ParseException pe)
    {
      System.err.println("Error creating RuleBasedCollator: " + rbc);
    }
  }

  public int compare(Object a, Object b)
  {
    if (rbc != null)
      return rbc.compare((String)a, (String)b);
    else
      return 0;
  }
}

(See the documentation for the java.text.RuleBasedCollator class for an explanation of the traditionalSpanishRules string.)

When Saxon sees an <xsl:sort> element with a lang attribute of es, it attempts to load a Java class named com.icl.saxon.sort.Compare_es. If that class can be loaded, Saxon calls that class's compare method as it sorts the <word> elements. When we run the stylesheet against our earlier example document, here are the results:

Word list - unsorted:
campo
luna
ciudad
llaves
chihuahua
arroz
limonada

Word list - sorted with default rules:
arroz
campo
chihuahua
ciudad
limonada
llaves
luna

Word list - sorted with Spanish rules:
arroz
campo
ciudad
chihuahua
limonada
luna
llaves

In the output, our Spanish sorting routine puts chihuahua after ciudad, and llaves after luna. With less than 20 lines of code, we've been able to add a new sorting function to our stylesheet. Most of the work is done for us by the Saxon processor and the methods of the java.text.RuleBasedCollator class.

The Saxon documentation has more information on extending Saxon with your own code. As you'll see in the examples in this chapter, most of the Java extensions you'll need to write will be simple pieces of code that simply make Java library methods and classes available to the XSLT processor.