Generating Links with the key() Function (XSLT)

5.2.3. Stylesheets That Use the key() Function

We've mentioned some useful things we can do with the key() function, so now we'll build some stylesheets that use it. Our first stylesheet will list all definitions written in a particular language. We'll go through the various parts of the stylesheet, explaining all the things we had to add to make everything work. The first thing we'll do, of course, is define the key() function:

<xsl:key name="language-index" match="defn" use="@language"/>

Notice that the match attribute we used was the simple element name defn. This tells the XSLT processor to match all <defn> elements at all levels of the document. Because of the structure of our document, we could have written match="/glossary/glentry/defn", as well. Although this XPath expression is more restrictive, it matches the same elements because all <defn> elements must appear inside <glentry> elements, which in turn appear inside the <glossary> element.

Next, we set up our stylesheet to determine what value of the language attribute we're searching for. We'll do this with a global <xsl:param> element:

<xsl:param name="targetLanguage"/>

Recall from our earlier discussion of the <xsl:param> element that any top-level <xsl:param> is a global parameter to the stylesheet and may be set or initialized from outside the stylesheet. The way to do this varies from one XSLT processor to another. Here's how it's done with Xalan. (The command should be on one line.)

java org.apache.xalan.xslt.Process -in moreterms.xml -xsl crossref2.xsl 
-param targetLanguage it

If you use Michael Kay's Saxon processor, the syntax looks like this:

java com.icl.saxon.StyleSheet moreterms.xml crossref2.xsl targetLanguage=it

Now that we've defined our key() function and defined a parameter to specify which language we're looking for, we need to generate our output. Here's the modified template that generates the HTML <title> and <h1> tags:

<xsl:template match="glossary">
  <html>
    <head>
      <title>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[1]/preceding-sibling::term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[last()]/preceding-sibling::term"/>
      </title>
    </head>
    <body>
      <h1>
        <xsl:text>Glossary Listing: </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[1]/ancestor::glentry/term"/>
        <xsl:text> - </xsl:text>
        <xsl:value-of select="key('language-index', 
          $targetLanguage)[last()]/ancestor::glentry/term"/>
      </h1>
      <xsl:for-each select="key('language-index', $targetLanguage)">
        <xsl:apply-templates select="ancestor::glentry"/>
      </xsl:for-each>
    </body>
  </html>
</xsl:template>

There are a couple of significant changes here. When we were using the id() function, it was easy to find the first and last terms in the document. Because we're now trying to list only the definitions that are written in a particular language, that won't work. Reading the XPath expressions in the <xsl:value-of> elements from left to right, we find the first and last <defn> elements returned by the key() function, then use the preceding-sibling axis to reference the <term> element that preceded it. We could also have written our XPath expressions using the ancestor axis:

<h1>
  <xsl:text>Glossary Listing: </xsl:text>
  <xsl:value-of select="key('language-index', 
    $targetLanguage)[1]/ancestor::glentry/term"/>
  <xsl:text> - </xsl:text>
  <xsl:value-of select="key('language-index', 
    $targetLanguage)[last()]/ancestor::glentry/term"/>
</h1>

Now that we've successfully generated the HTML <title> and <h1> elements, we need to process the actual definitions for the chosen language. To do this, we'll use the targetLanguage parameter. Here's how the rest of the template looks:

<xsl:for-each select="key('language-index', $targetLanguage)">
  <xsl:apply-templates select="ancestor::glentry"/>
</xsl:for-each>

In this code, we've selected all the values from the language-index key that match the targetLanguage parameter. For each one, we use the ancestor axis to select the <glentry> element. We've already written the templates that process these elements correctly, so we can just reuse them.

The final change we make is to select only those <defn> elements whose language attributes match the targetLanguage parameter. We do this with a simple XPath expression:

<xsl:apply-templates select="defn[@language=$targetLanguage]"/>

Here's the complete stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
<xsl:strip-space elements="*"/>

  <xsl:key name="language-index" match="defn" use="@language"/>

  <xsl:param name="targetLanguage"/>

  <xsl:template match="/">
    <xsl:apply-templates select="glossary"/>
  </xsl:template>

  <xsl:template match="glossary">
    <html>
      <head>
        <title>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[1]/preceding-sibling::term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[last()]/preceding-sibling::term"/>
        </title>
      </head>
      <body>
        <h1>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[1]/ancestor::glentry/term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="key('language-index', 
            $targetLanguage)[last()]/ancestor::glentry/term"/>
        </h1>
        <xsl:for-each select="key('language-index', $targetLanguage)">
          <xsl:apply-templates select="ancestor::glentry"/>
        </xsl:for-each>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="glentry">
    <p>
      <b>
        <a name="{term/@id}"/>
        <xsl:value-of select="term"/>
        <xsl:text>: </xsl:text>
      </b>
      <xsl:apply-templates select="defn[@language=$targetLanguage]"/>
    </p>
  </xsl:template>

  <xsl:template match="defn">
    <xsl:apply-templates 
     select="*|comment()|processing-instruction()|text()"/>
  </xsl:template>

  <xsl:template match="xref">
    <a href="#{@refid}">
      <xsl:choose>
        <xsl:when test="id(@refid)/@xreftext">
          <xsl:value-of select="id(@refid)/@xreftext"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="id(@refid)"/>
        </xsl:otherwise>

      </xsl:choose>
    </a>
  </xsl:template>

  <xsl:template match="seealso">
    <b>
      <xsl:text>See also: </xsl:text>
    </b>
    <xsl:for-each select="id(@refids)">
      <a href="#{@id}">
        <xsl:choose>
          <xsl:when test="@xreftext">
            <xsl:value-of select="@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="."/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="not(position()=last())">
        <xsl:text>, </xsl:text>
      </xsl:if>
    </xsl:for-each>
    <xsl:text>.  </xsl:text>
  </xsl:template>

</xsl:stylesheet>

Given our sample document and a targetLanguage of en, we get these results:

<html>
  <head>
    <title>Glossary Listing: applet - wildcard character</title>
  </head>
  <body>
    <h1>Glossary Listing: applet - wildcard character</h1>
    <p>
      <b><a name="applet"></a>applet: </b>
      An application program,
      written in the Java programming language, that can be 
      retrieved from a web server and executed by a web browser. 
      A reference to an applet appears in the markup for a web 
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same 
      way that it retrieves a graphics file. 
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the 
      client upon which it is executing, and the applet's 
      communication across the network is limited to the server 
      from which it was downloaded. 
      Contrast with <a href="#servlet">servlet</a>.
      ...

Changing the targetLanguage to it, the results are now different:

<html>
  <head>
    <title>Glossary Listing: applet - servlet</title>
  </head>
  <body>
    <h1>Glossary Listing: applet - servlet</h1>
    <p>
      <b><a name="applet"></a>applet: </b>
      [Pretend this is an Italian definition of applet.]
    </p>
    <p>
      <b><a name="DMZlong"></a>demilitarized 
      zone (DMZ): </b>
      [Pretend this is an Italian definition of DMZ.]
    </p>
    <p>
      <b><a name="servlet"></a>servlet: </b>
      [Pretend this is an Italian definition of servlet.]
    </p>
  </body>
</html>

With this stylesheet, we have a way to create a useful subset of our glossary. Notice that we're still using our original technique of ID, IDREF, and IDREFS to process the <xref> and <seealso> elements. If you want, you could redefine the processing to use the key() function instead. Here's how you'd define a key() function to mimic our earlier use of ID and IDREF:

<xsl:template match="xref">
  <a href="#{@refid}">
    <xsl:choose>
      <xsl:when test="key('term-ids', @refid)[1]/@xreftext">
        <xsl:value-of select="key('term-ids', @refid)[1]/@xreftext"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="key('term-ids', @refid)[1]"/>
      </xsl:otherwise>
    </xsl:choose>
  </a>
</xsl:template>

As an exercise for the reader, you can modify this stylesheet so that it lists only definitions that apply to a particular topic, or only terms that are acronyms.

5.2.3.1. The key() function and the IDREFS datatype

For all its flexibility, the key() function doesn't support anything like the IDREFS datatype. We can try to use the key() function the same way we used id():

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:for-each select="key('term-ids', @refids)">
    <a>
  ...

But the <xsl:for-each> doesn't have anything to work with. That's because the key value we're looking for is "wildcard-char DMZlong pattern-matching". When we were dealing with the id() function, this string was broken into three tokens because anything with a datatype of ID can't contain a space. With the key() function, we can search on anything, including the contents of an element. (See Section 5.3, "Generating Links in Unstructured Documents" for an example of this.) For this reason, our call to the key() function asking for all the <term> elements with an id attribute equal to "wildcard-char DMZlong pattern-matching" returns nothing. Any attribute with a datatype of ID can't contain spaces, so we get no results.

There are several ways to deal with this problem; we'll go through our choices next.

5.2.3.2. Solution #1: Replace the IDREFS datatype

If you consider this a problem and refuse to use the id() function, there are several approaches you can take. The most drastic (but probably the simplest to implement) is to not use the IDREFS datatype at all. You could change the <seealso> element so that it contains a list of references to other elements:

<seealso>
  <item refid="wildcard-character"/>
  <item refid="DMZlong"/>
  <item refid="pattern-matching"/>
</seealso>

This approach has the advantage that we can use the value of all the refid attributes of all <item> elements with the key() function. That means we can search on anything, not just values of attributes. The disadvantage, of course, is that we had to change the structure of our XML document to make this approach work. If you have control of the structure of your XML document, that's possible; it's entirely likely, of course, that you can't change the XML document at all. A variation on this approach would be to use a stylesheet to transform the IDREFS datatype into the previous structure.

5.2.3.3. Solution #2: Use the XPath contains() function

A second approach is to leave the structure of the XML document unchanged, then use the XPath contains() function to find all <term> elements whose id attributes are contained in the value of the refids attribute of the <seealso> element. Here's how that would work:

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:variable name="id_list" select="@refids"/>
  <xsl:for-each select="//term">
    <xsl:if test="contains($id_list, @id)">
      <a href="#{@id}">
        <xsl:choose>
          <xsl:when test="@xreftext">
            <xsl:value-of select="@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="."/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="not(position()=last())">
        <xsl:text>, </xsl:text>
      </xsl:if>
    </xsl:if>
  </xsl:for-each>
  <xsl:text>.  </xsl:text>
</xsl:template>

We've done a couple of things here: First, we've saved the value of the refids attribute of the <seealso> element in the variable id_list. That's because we can't access it within the <for-each> element. We can find a given <seealso> element from within a given <term> element, but it's too difficult to find that element generically from every <term> element. The simplest way to find the element is to save the value in a variable.

Second, we look at all of the <term> elements in the document. For each one, if our variable (containing the refids attribute of the <seealso> element) contains the value of the current <term> element's id attribute, then we process that <term> element.

Here are the results our stylesheet generates:

<html>
  <head>
    <title>Glossary Listing: applet - wildcard character</title>
  </head>
  <body>
    <h1>Glossary Listing: applet - wildcard character</h1>
    <p>
      <b><a name="applet"></a>applet: </b>
      An application program,
      written in the Java programming language, that can be
      retrieved from a web server and executed by a web browser.
      A reference to an applet appears in the markup for a web
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same
      way that it retrieves a graphics file.
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the
      client upon which it is executing, and the applet's
      communication across the network is limited to the server
      from which it was downloaded.
      Contrast with <a href="#servlet">servlet</a>.
      <b>See also: </b><a 
      href="#DMZlong">demilitarized zone</a>, <a href="#DMZ">
      DMZ</a>, <a href="#pattern-matching">pattern-matching 
      character</a>, <a href="#wildcard-char">wildcard 
      character</a>. 
    </p>
      ...

There are a couple of problems here. The most mundane is that in our stylesheet, we don't know how many <term> elements have id attributes contained in our variable. That means it's difficult to insert commas correctly between the matching <term>s. In the output here, we were lucky that the last match was in fact the last term, so the results here are correct. For any <seealso> element whose refid attribute doesn't contain the id attribute of the last <term> element in the document, this stylesheet won't work.

The more serious problem is that one of the matches is, in fact, wrong. If you look closely at the output, we get a match for the term DMZ, even though there isn't an exact match for its id in our variable. That's because the XPath contains() function says (correctly) that the value DMZlong contains the ids DMZlong and DMZ.

So our second attempt at solving this problem doesn't require us to change the structure of the XML document, but in this case, we have to change some of our IDs so that the problem we just mentioned doesn't occur. That's probably going to be a maintenance nightmare and a serious drawback to this approach.

5.2.3.4. Solution #3: Use recursion to process the IDREFS datatype

Here we use a recursive template to tokenize the refids attribute into individual IDs, then process each one individually. This style of programming takes a while to get used to, but it can be fairly simple. Here's the crux of our stylesheet:

<xsl:template match="seealso">
  <b>
    <xsl:text>See also: </xsl:text>
  </b>
  <xsl:call-template name="resolveIDREFS">
    <xsl:with-param name="stringToTokenize" select="@refids"/>
  </xsl:call-template>
</xsl:template>

<xsl:template name="resolveIDREFS">
  <xsl:param name="stringToTokenize"/>
  <xsl:variable name="normalizedString">
    <xsl:value-of 
      select="concat(normalize-space($stringToTokenize), ' ')"/>
  </xsl:variable>
  <xsl:choose>
    <xsl:when test="$normalizedString!=' '">
      <xsl:variable name="firstOfString" 
        select="substring-before($normalizedString, ' ')"/>
      <xsl:variable name="restOfString" 
        select="substring-after($normalizedString, ' ')"/>
      <a href="#{$firstOfString}">
        <xsl:choose>
          <xsl:when 
            test="key('term-ids', $firstOfString)[1]/@xreftext">
            <xsl:value-of 
              select="key('term-ids', $firstOfString)[1]/@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of 
              select="key('term-ids', $firstOfString)[1]"/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="$restOfString!=''">
        <xsl:text>, </xsl:text>
      </xsl:if>
      <xsl:call-template name="resolveIDREFS">
        <xsl:with-param name="stringToTokenize" 
          select="$restOfString"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:text>.</xsl:text>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

The first thing we did was invoke the named template resolveIDREFS in the template for the <seealso> element. While invoking the template, we pass in the value of the refids attribute and let recursion work its magic.

The resolveIDREFS template works like this:

Break the string into two parts: the first ID and the rest of the string. If there is no first ID (i.e., the string contains only whitespace), we're done.
Resolve the cross-reference for the first ID.
Invoke the template with the rest of the string.

One technique in particular is worth mentioning here: the way we handled whitespace in the attribute value. We pass the string we want to tokenize as a parameter to the template, but we need to normalize the whitespace. We use two XPath functions to do this: normalize-space() and concat(). The call looks like this:

<xsl:template name="resolveIDREFS">
  <xsl:param name="stringToTokenize"/>
  <xsl:variable name="normalizedString">
    <xsl:value-of 
      select="concat(normalize-space($stringToTokenize), ' ')"/>
  </xsl:variable>

The normalize-space() function removes all leading and trailing whitespace from a string and replaces internal whitespace characters with a single space. Remember that whitespace inside an attribute isn't significant; our <seealso> element could be written like this:

  <seealso refids="  wildcard-char 





DMZlong 
pattern-matching       "/>

When we pass this attribute to normalizeSpace(), the returned value is wildcard-char DMZlong pattern-matching. All whitespace at the start and end of the value has been removed and all the whitespace between characters has been replaced with a single space.

Because we're using the substring-before() and substring-after() functions to find the first token and the rest of the string, it's important that there be at least one space in the string. (It's possible, of course, that an IDREFS attribute contains only one ID.) We use the concat() function to add a space to the end of the string. When the string contains only that space, we know we're done.

Although this approach is more tedious, it does everything we need it to do. We don't have to change our XML document, and we correctly resolve all the IDs in the IDREFS datatype.

5.2.3.5. Solution #4: Use an extension function

The final approach is to write an extension function that tokenizes the refids attribute and returns a node-set containing all id values we need to search for. Xalan ships with an extension that does just that. We invoke the extension function on the value of the refids attribute, then use a <xsl:for-each> element to process all items in the node-set. We'll cover extension functions in Chapter 8, "Extending XSLT", but for now, here's what the stylesheet looks like:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:java="http://xml.apache.org/xslt/java"
  exclude-result-prefixes="java">

<xsl:output method="html" indent="yes"/>
<xsl:strip-space elements="*"/>

  <xsl:key name="term-ids" match="term" use="@id"/>

  <xsl:template match="/">
    <xsl:apply-templates select="glossary"/>
  </xsl:template>

  <xsl:template match="glossary">
    <html>
      <head>
        <title>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="glentry[1]/term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="glentry[last()]/term"/>
        </title>
      </head>
      <body>
        <h1>
          <xsl:text>Glossary Listing: </xsl:text>
          <xsl:value-of select="glentry[1]/term"/>
          <xsl:text> - </xsl:text>
          <xsl:value-of select="glentry[last()]/term"/>
        </h1>
        <xsl:apply-templates select="glentry"/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="glentry">
    <p>
      <b>
        <a name="{term/@id}"/>
        <xsl:value-of select="term"/>
        <xsl:text>: </xsl:text>
      </b>
      <xsl:apply-templates select="defn"/>
    </p>
  </xsl:template>

  <xsl:template match="defn">
    <xsl:apply-templates 
     select="*|comment()|processing-instruction()|text()"/>
  </xsl:template>

  <xsl:template match="xref">
    <a href="#{@refid}">
      <xsl:choose>
        <xsl:when test="key('term-ids', @refid)[1]/@xreftext">
          <xsl:value-of select="key('term-ids', @refid)[1]/@xreftext"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="key('term-ids', @refid)[1]"/>
        </xsl:otherwise>
      </xsl:choose>
    </a>
  </xsl:template>

  <xsl:template match="seealso">
    <b>
      <xsl:text>See also: </xsl:text>
    </b>
    <xsl:for-each 
      select="java:org.apache.xalan.lib.Extensions.tokenize(@refids)">
      <a href="{key('term-ids', .)/@id}">
        <xsl:choose>
          <xsl:when test="key('term-ids', .)/@xreftext">
            <xsl:value-of select="key('term-ids', .)/@xreftext"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="key('term-ids', .)"/>
          </xsl:otherwise>
        </xsl:choose>
      </a>
      <xsl:if test="not(position()=last())">
        <xsl:text>, </xsl:text>
      </xsl:if>
    </xsl:for-each>
    <xsl:text>.</xsl:text>
  </xsl:template>

</xsl:stylesheet>

In this case, the tokenize function (defined in the Java class org.apache.xalan.lib.Extensions) takes a string as input, then converts the string into a node-set in which each token in the original string becomes a node.

Be aware that using extension functions limits the portability of your stylesheets. The extension function here does what we want, but we couldn't use this extension function with Saxon, XT, or the XSLT tools from Oracle or Microsoft. They may or may not supply similar functions, and if they do, you'll have to modify your stylesheet slightly to use them. If it's important to you that you be able to switch XSLT processors at some point in the future, using extensions will limit your ability to do that.

Hopefully at this point you're convinced of at least one of the following two things:

If you have an attribute with a datatype of IDREFS, you should use the id() function to resolve cross-references.

The IDREFS datatype is pretty limited, so you should avoid using it.

5.2. Generating Links with the key() Function

5.2.1. Defining a key()