5.2. Generating Links with the key() FunctionNow that we've covered the id() function in great detail, we'll move on to XSLT's key() function. Each key() function effectively creates an index of the document. You can then use that index to find all elements that have a particular property. Conceptually, key() works like a database index. If you have a database of (U.S. postal) addresses, you might want to index that database by the people's last names, by the states in which they live, by their Zip Codes, etc. Each index takes a certain amount of time to build, but it saves processing time later. If you want to find all the people who live in the state of Idaho, you can use the index to find all those people directly; you don't have to search the entire database. We'll discuss the details of how the key() function works, then we'll compare it to the id() function. 5.2.1. Defining a key()You define a key() function with the <xsl:key> element: <xsl:key name="language-index" match="defn" use="@language"/> The key has three elements:
5.2.2. A Slightly More Complicated XML Document in Need of LinksTo illustrate the full power of the key() function, we'll modify our original glossary slightly. Here's an excerpt: <glentry> <term id="DMZlong" xreftext="demilitarized zone">demilitarized zone (DMZ)</term> <defn topic="security" language="en"> In network security, a network that is isolated from, and serves as a neutral zone between, a trusted network (for example, a private intranet) and an untrusted network (for example, the Internet). One or more secure gateways usually control access to the DMZ from the trusted or the untrusted network. </defn> <defn topic="security" language="it"> [Pretend this is an Italian definition of DMZ.] </defn> <defn topic="security" language="es"> [Pretend this is a Spanish definition of DMZ.] </defn> <defn topic="security" language="jp"> [Pretend this is a Japanese definition of DMZ.] </defn> <defn topic="security" language="de"> [Pretend this is a German definition of DMZ.] </defn> </glentry> <glentry> <term id="DMZ" acronym="yes">DMZ</term> <defn topic="security" language="en"> See <xref refid="DMZlong"/>. </defn> </glentry> In our modified document, we've added two new attributes to <defn>: topic and language. We also added the acronym attribute to the <term> element. We've modified our DTD to add these attributes and enumerate their valid values: <!--The word being defined--> <!ELEMENT term (#PCDATA) > <!--The id is used for cross-referencing, and the xreftext is the text used by cross-references.--> <!ATTLIST term id ID #REQUIRED xreftext CDATA #IMPLIED acronym (yes|no) "no"> <!--The definition of the term--> <!ELEMENT defn (#PCDATA | xref | seealso)* > <!--The topic defines the subject of the definition, the language code defines the language of this definition, and the acronym is yes or no (default is no).--> <!ATTLIST defn topic (Java|general|security) "general" language (en|de|es|it|jp) "en"> The topic attribute defines the computing topic to which this definition applies, and the language attribute defines the language in which this definition is written. The acronym attribute defines whether or not this term is an acronym. Now that we've created a more flexible XML document, we can use the key() function to do several useful things:
Thinking back to our earlier discussion, these are all things we can't do with the id() function. If the language, topic, and acronym attributes were defined to be of type ID, only one definition could be written in English, only one definition could apply to the security topic, and only one term could be an acronym. Clearly, that's an unacceptable limitation on our document. 5.2.3. Stylesheets That Use the key() FunctionWe've mentioned some useful things we can do with the key() function, so now we'll build some stylesheets that use it. Our first stylesheet will list all definitions written in a particular language. We'll go through the various parts of the stylesheet, explaining all the things we had to add to make everything work. The first thing we'll do, of course, is define the key() function: <xsl:key name="language-index" match="defn" use="@language"/> Notice that the match attribute we used was the simple element name defn. This tells the XSLT processor to match all <defn> elements at all levels of the document. Because of the structure of our document, we could have written match="/glossary/glentry/defn", as well. Although this XPath expression is more restrictive, it matches the same elements because all <defn> elements must appear inside <glentry> elements, which in turn appear inside the <glossary> element. Next, we set up our stylesheet to determine what value of the language attribute we're searching for. We'll do this with a global <xsl:param> element: <xsl:param name="targetLanguage"/> Recall from our earlier discussion of the <xsl:param> element that any top-level <xsl:param> is a global parameter to the stylesheet and may be set or initialized from outside the stylesheet. The way to do this varies from one XSLT processor to another. Here's how it's done with Xalan. (The command should be on one line.) java org.apache.xalan.xslt.Process -in moreterms.xml -xsl crossref2.xsl -param targetLanguage it If you use Michael Kay's Saxon processor, the syntax looks like this: java com.icl.saxon.StyleSheet moreterms.xml crossref2.xsl targetLanguage=it Now that we've defined our key() function and defined a parameter to specify which language we're looking for, we need to generate our output. Here's the modified template that generates the HTML <title> and <h1> tags: <xsl:template match="glossary"> <html> <head> <title> <xsl:text>Glossary Listing: </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[1]/preceding-sibling::term"/> <xsl:text> - </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[last()]/preceding-sibling::term"/> </title> </head> <body> <h1> <xsl:text>Glossary Listing: </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[1]/ancestor::glentry/term"/> <xsl:text> - </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[last()]/ancestor::glentry/term"/> </h1> <xsl:for-each select="key('language-index', $targetLanguage)"> <xsl:apply-templates select="ancestor::glentry"/> </xsl:for-each> </body> </html> </xsl:template> There are a couple of significant changes here. When we were using the id() function, it was easy to find the first and last terms in the document. Because we're now trying to list only the definitions that are written in a particular language, that won't work. Reading the XPath expressions in the <xsl:value-of> elements from left to right, we find the first and last <defn> elements returned by the key() function, then use the preceding-sibling axis to reference the <term> element that preceded it. We could also have written our XPath expressions using the ancestor axis: <h1> <xsl:text>Glossary Listing: </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[1]/ancestor::glentry/term"/> <xsl:text> - </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[last()]/ancestor::glentry/term"/> </h1> Now that we've successfully generated the HTML <title> and <h1> elements, we need to process the actual definitions for the chosen language. To do this, we'll use the targetLanguage parameter. Here's how the rest of the template looks: <xsl:for-each select="key('language-index', $targetLanguage)"> <xsl:apply-templates select="ancestor::glentry"/> </xsl:for-each> In this code, we've selected all the values from the language-index key that match the targetLanguage parameter. For each one, we use the ancestor axis to select the <glentry> element. We've already written the templates that process these elements correctly, so we can just reuse them. The final change we make is to select only those <defn> elements whose language attributes match the targetLanguage parameter. We do this with a simple XPath expression: <xsl:apply-templates select="defn[@language=$targetLanguage]"/> Here's the complete stylesheet: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:key name="language-index" match="defn" use="@language"/> <xsl:param name="targetLanguage"/> <xsl:template match="/"> <xsl:apply-templates select="glossary"/> </xsl:template> <xsl:template match="glossary"> <html> <head> <title> <xsl:text>Glossary Listing: </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[1]/preceding-sibling::term"/> <xsl:text> - </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[last()]/preceding-sibling::term"/> </title> </head> <body> <h1> <xsl:text>Glossary Listing: </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[1]/ancestor::glentry/term"/> <xsl:text> - </xsl:text> <xsl:value-of select="key('language-index', $targetLanguage)[last()]/ancestor::glentry/term"/> </h1> <xsl:for-each select="key('language-index', $targetLanguage)"> <xsl:apply-templates select="ancestor::glentry"/> </xsl:for-each> </body> </html> </xsl:template> <xsl:template match="glentry"> <p> <b> <a name="{term/@id}"/> <xsl:value-of select="term"/> <xsl:text>: </xsl:text> </b> <xsl:apply-templates select="defn[@language=$targetLanguage]"/> </p> </xsl:template> <xsl:template match="defn"> <xsl:apply-templates select="*|comment()|processing-instruction()|text()"/> </xsl:template> <xsl:template match="xref"> <a href="#{@refid}"> <xsl:choose> <xsl:when test="id(@refid)/@xreftext"> <xsl:value-of select="id(@refid)/@xreftext"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="id(@refid)"/> </xsl:otherwise> </xsl:choose> </a> </xsl:template> <xsl:template match="seealso"> <b> <xsl:text>See also: </xsl:text> </b> <xsl:for-each select="id(@refids)"> <a href="#{@id}"> <xsl:choose> <xsl:when test="@xreftext"> <xsl:value-of select="@xreftext"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="."/> </xsl:otherwise> </xsl:choose> </a> <xsl:if test="not(position()=last())"> <xsl:text>, </xsl:text> </xsl:if> </xsl:for-each> <xsl:text>. </xsl:text> </xsl:template> </xsl:stylesheet> Given our sample document and a targetLanguage of en, we get these results: <html> <head> <title>Glossary Listing: applet - wildcard character</title> </head> <body> <h1>Glossary Listing: applet - wildcard character</h1> <p> <b><a name="applet"></a>applet: </b> An application program, written in the Java programming language, that can be retrieved from a web server and executed by a web browser. A reference to an applet appears in the markup for a web page, in the same way that a reference to a graphics file appears; a browser retrieves an applet in the same way that it retrieves a graphics file. For security reasons, an applet's access rights are limited in two ways: the applet cannot access the file system of the client upon which it is executing, and the applet's communication across the network is limited to the server from which it was downloaded. Contrast with <a href="#servlet">servlet</a>. ... Changing the targetLanguage to it, the results are now different: <html> <head> <title>Glossary Listing: applet - servlet</title> </head> <body> <h1>Glossary Listing: applet - servlet</h1> <p> <b><a name="applet"></a>applet: </b> [Pretend this is an Italian definition of applet.] </p> <p> <b><a name="DMZlong"></a>demilitarized zone (DMZ): </b> [Pretend this is an Italian definition of DMZ.] </p> <p> <b><a name="servlet"></a>servlet: </b> [Pretend this is an Italian definition of servlet.] </p> </body> </html> With this stylesheet, we have a way to create a useful subset of our glossary. Notice that we're still using our original technique of ID, IDREF, and IDREFS to process the <xref> and <seealso> elements. If you want, you could redefine the processing to use the key() function instead. Here's how you'd define a key() function to mimic our earlier use of ID and IDREF: <xsl:template match="xref"> <a href="#{@refid}"> <xsl:choose> <xsl:when test="key('term-ids', @refid)[1]/@xreftext"> <xsl:value-of select="key('term-ids', @refid)[1]/@xreftext"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="key('term-ids', @refid)[1]"/> </xsl:otherwise> </xsl:choose> </a> </xsl:template> As an exercise for the reader, you can modify this stylesheet so that it lists only definitions that apply to a particular topic, or only terms that are acronyms. 5.2.3.1. The key() function and the IDREFS datatypeFor all its flexibility, the key() function doesn't support anything like the IDREFS datatype. We can try to use the key() function the same way we used id(): <xsl:template match="seealso"> <b> <xsl:text>See also: </xsl:text> </b> <xsl:for-each select="key('term-ids', @refids)"> <a> ... But the <xsl:for-each> doesn't have anything to work with. That's because the key value we're looking for is "wildcard-char DMZlong pattern-matching". When we were dealing with the id() function, this string was broken into three tokens because anything with a datatype of ID can't contain a space. With the key() function, we can search on anything, including the contents of an element. (See Section 5.3, "Generating Links in Unstructured Documents" for an example of this.) For this reason, our call to the key() function asking for all the <term> elements with an id attribute equal to "wildcard-char DMZlong pattern-matching" returns nothing. Any attribute with a datatype of ID can't contain spaces, so we get no results. There are several ways to deal with this problem; we'll go through our choices next. 5.2.3.2. Solution #1: Replace the IDREFS datatypeIf you consider this a problem and refuse to use the id() function, there are several approaches you can take. The most drastic (but probably the simplest to implement) is to not use the IDREFS datatype at all. You could change the <seealso> element so that it contains a list of references to other elements: <seealso> <item refid="wildcard-character"/> <item refid="DMZlong"/> <item refid="pattern-matching"/> </seealso> This approach has the advantage that we can use the value of all the refid attributes of all <item> elements with the key() function. That means we can search on anything, not just values of attributes. The disadvantage, of course, is that we had to change the structure of our XML document to make this approach work. If you have control of the structure of your XML document, that's possible; it's entirely likely, of course, that you can't change the XML document at all. A variation on this approach would be to use a stylesheet to transform the IDREFS datatype into the previous structure. 5.2.3.3. Solution #2: Use the XPath contains() functionA second approach is to leave the structure of the XML document unchanged, then use the XPath contains() function to find all <term> elements whose id attributes are contained in the value of the refids attribute of the <seealso> element. Here's how that would work: <xsl:template match="seealso"> <b> <xsl:text>See also: </xsl:text> </b> <xsl:variable name="id_list" select="@refids"/> <xsl:for-each select="//term"> <xsl:if test="contains($id_list, @id)"> <a href="#{@id}"> <xsl:choose> <xsl:when test="@xreftext"> <xsl:value-of select="@xreftext"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="."/> </xsl:otherwise> </xsl:choose> </a> <xsl:if test="not(position()=last())"> <xsl:text>, </xsl:text> </xsl:if> </xsl:if> </xsl:for-each> <xsl:text>. </xsl:text> </xsl:template> We've done a couple of things here: First, we've saved the value of the refids attribute of the <seealso> element in the variable id_list. That's because we can't access it within the <for-each> element. We can find a given <seealso> element from within a given <term> element, but it's too difficult to find that element generically from every <term> element. The simplest way to find the element is to save the value in a variable. Second, we look at all of the <term> elements in the document. For each one, if our variable (containing the refids attribute of the <seealso> element) contains the value of the current <term> element's id attribute, then we process that <term> element. Here are the results our stylesheet generates: <html> <head> <title>Glossary Listing: applet - wildcard character</title> </head> <body> <h1>Glossary Listing: applet - wildcard character</h1> <p> <b><a name="applet"></a>applet: </b> An application program, written in the Java programming language, that can be retrieved from a web server and executed by a web browser. A reference to an applet appears in the markup for a web page, in the same way that a reference to a graphics file appears; a browser retrieves an applet in the same way that it retrieves a graphics file. For security reasons, an applet's access rights are limited in two ways: the applet cannot access the file system of the client upon which it is executing, and the applet's communication across the network is limited to the server from which it was downloaded. Contrast with <a href="#servlet">servlet</a>. <b>See also: </b><a href="#DMZlong">demilitarized zone</a>, <a href="#DMZ"> DMZ</a>, <a href="#pattern-matching">pattern-matching character</a>, <a href="#wildcard-char">wildcard character</a>. </p> ... There are a couple of problems here. The most mundane is that in our stylesheet, we don't know how many <term> elements have id attributes contained in our variable. That means it's difficult to insert commas correctly between the matching <term>s. In the output here, we were lucky that the last match was in fact the last term, so the results here are correct. For any <seealso> element whose refid attribute doesn't contain the id attribute of the last <term> element in the document, this stylesheet won't work. The more serious problem is that one of the matches is, in fact, wrong. If you look closely at the output, we get a match for the term DMZ, even though there isn't an exact match for its id in our variable. That's because the XPath contains() function says (correctly) that the value DMZlong contains the ids DMZlong and DMZ. So our second attempt at solving this problem doesn't require us to change the structure of the XML document, but in this case, we have to change some of our IDs so that the problem we just mentioned doesn't occur. That's probably going to be a maintenance nightmare and a serious drawback to this approach. 5.2.3.4. Solution #3: Use recursion to process the IDREFS datatypeHere we use a recursive template to tokenize the refids attribute into individual IDs, then process each one individually. This style of programming takes a while to get used to, but it can be fairly simple. Here's the crux of our stylesheet: <xsl:template match="seealso"> <b> <xsl:text>See also: </xsl:text> </b> <xsl:call-template name="resolveIDREFS"> <xsl:with-param name="stringToTokenize" select="@refids"/> </xsl:call-template> </xsl:template> <xsl:template name="resolveIDREFS"> <xsl:param name="stringToTokenize"/> <xsl:variable name="normalizedString"> <xsl:value-of select="concat(normalize-space($stringToTokenize), ' ')"/> </xsl:variable> <xsl:choose> <xsl:when test="$normalizedString!=' '"> <xsl:variable name="firstOfString" select="substring-before($normalizedString, ' ')"/> <xsl:variable name="restOfString" select="substring-after($normalizedString, ' ')"/> <a href="#{$firstOfString}"> <xsl:choose> <xsl:when test="key('term-ids', $firstOfString)[1]/@xreftext"> <xsl:value-of select="key('term-ids', $firstOfString)[1]/@xreftext"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="key('term-ids', $firstOfString)[1]"/> </xsl:otherwise> </xsl:choose> </a> <xsl:if test="$restOfString!=''"> <xsl:text>, </xsl:text> </xsl:if> <xsl:call-template name="resolveIDREFS"> <xsl:with-param name="stringToTokenize" select="$restOfString"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:text>.</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:template> The first thing we did was invoke the named template resolveIDREFS in the template for the <seealso> element. While invoking the template, we pass in the value of the refids attribute and let recursion work its magic. The resolveIDREFS template works like this:
One technique in particular is worth mentioning here: the way we handled whitespace in the attribute value. We pass the string we want to tokenize as a parameter to the template, but we need to normalize the whitespace. We use two XPath functions to do this: normalize-space() and concat(). The call looks like this: <xsl:template name="resolveIDREFS"> <xsl:param name="stringToTokenize"/> <xsl:variable name="normalizedString"> <xsl:value-of select="concat(normalize-space($stringToTokenize), ' ')"/> </xsl:variable> The normalize-space() function removes all leading and trailing whitespace from a string and replaces internal whitespace characters with a single space. Remember that whitespace inside an attribute isn't significant; our <seealso> element could be written like this: <seealso refids=" wildcard-char DMZlong pattern-matching "/> When we pass this attribute to normalizeSpace(), the returned value is wildcard-char DMZlong pattern-matching. All whitespace at the start and end of the value has been removed and all the whitespace between characters has been replaced with a single space. Because we're using the substring-before() and substring-after() functions to find the first token and the rest of the string, it's important that there be at least one space in the string. (It's possible, of course, that an IDREFS attribute contains only one ID.) We use the concat() function to add a space to the end of the string. When the string contains only that space, we know we're done. Although this approach is more tedious, it does everything we need it to do. We don't have to change our XML document, and we correctly resolve all the IDs in the IDREFS datatype. 5.2.3.5. Solution #4: Use an extension functionThe final approach is to write an extension function that tokenizes the refids attribute and returns a node-set containing all id values we need to search for. Xalan ships with an extension that does just that. We invoke the extension function on the value of the refids attribute, then use a <xsl:for-each> element to process all items in the node-set. We'll cover extension functions in Chapter 8, "Extending XSLT", but for now, here's what the stylesheet looks like: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:java="http://xml.apache.org/xslt/java" exclude-result-prefixes="java"> <xsl:output method="html" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:key name="term-ids" match="term" use="@id"/> <xsl:template match="/"> <xsl:apply-templates select="glossary"/> </xsl:template> <xsl:template match="glossary"> <html> <head> <title> <xsl:text>Glossary Listing: </xsl:text> <xsl:value-of select="glentry[1]/term"/> <xsl:text> - </xsl:text> <xsl:value-of select="glentry[last()]/term"/> </title> </head> <body> <h1> <xsl:text>Glossary Listing: </xsl:text> <xsl:value-of select="glentry[1]/term"/> <xsl:text> - </xsl:text> <xsl:value-of select="glentry[last()]/term"/> </h1> <xsl:apply-templates select="glentry"/> </body> </html> </xsl:template> <xsl:template match="glentry"> <p> <b> <a name="{term/@id}"/> <xsl:value-of select="term"/> <xsl:text>: </xsl:text> </b> <xsl:apply-templates select="defn"/> </p> </xsl:template> <xsl:template match="defn"> <xsl:apply-templates select="*|comment()|processing-instruction()|text()"/> </xsl:template> <xsl:template match="xref"> <a href="#{@refid}"> <xsl:choose> <xsl:when test="key('term-ids', @refid)[1]/@xreftext"> <xsl:value-of select="key('term-ids', @refid)[1]/@xreftext"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="key('term-ids', @refid)[1]"/> </xsl:otherwise> </xsl:choose> </a> </xsl:template> <xsl:template match="seealso"> <b> <xsl:text>See also: </xsl:text> </b> <xsl:for-each select="java:org.apache.xalan.lib.Extensions.tokenize(@refids)"> <a href="{key('term-ids', .)/@id}"> <xsl:choose> <xsl:when test="key('term-ids', .)/@xreftext"> <xsl:value-of select="key('term-ids', .)/@xreftext"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="key('term-ids', .)"/> </xsl:otherwise> </xsl:choose> </a> <xsl:if test="not(position()=last())"> <xsl:text>, </xsl:text> </xsl:if> </xsl:for-each> <xsl:text>.</xsl:text> </xsl:template> </xsl:stylesheet> In this case, the tokenize function (defined in the Java class org.apache.xalan.lib.Extensions) takes a string as input, then converts the string into a node-set in which each token in the original string becomes a node. Be aware that using extension functions limits the portability of your stylesheets. The extension function here does what we want, but we couldn't use this extension function with Saxon, XT, or the XSLT tools from Oracle or Microsoft. They may or may not supply similar functions, and if they do, you'll have to modify your stylesheet slightly to use them. If it's important to you that you be able to switch XSLT processors at some point in the future, using extensions will limit your ability to do that. Hopefully at this point you're convinced of at least one of the following two things: 5.2.4. Advantages of the key() FunctionNow that we've taken the key() function through its paces, you can see that it has several advantages:
Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|