5.2. Generating Links with the key() FunctionNow that we've covered the id() function in great detail, we'll move on to XSLT's key() function. Each key() function effectively creates an index of the document. You can then use that index to find all elements that have a particular property. Conceptually, key() works like a database index. If you have a database of (U.S. postal) addresses, you might want to index that database by the people's last names, by the states in which they live, by their Zip Codes, etc. Each index takes a certain amount of time to build, but it saves processing time later. If you want to find all the people who live in the state of Idaho, you can use the index to find all those people directly; you don't have to search the entire database. We'll discuss the details of how the key() function works, then we'll compare it to the id() function. 5.2.1. Defining a key()You define a key() function with the <xsl:key> element: <xsl:key name="language-index" match="defn" use="@language"/> The key has three elements:
5.2.2. A Slightly More Complicated XML Document in Need of LinksTo illustrate the full power of the key() function, we'll modify our original glossary slightly. Here's an excerpt:
In our modified document, we've added two new attributes to <defn>: topic and language. We also added the acronym attribute to the <term> element. We've modified our DTD to add these attributes and enumerate their valid values:
The topic attribute defines the computing topic to which this definition applies, and the language attribute defines the language in which this definition is written. The acronym attribute defines whether or not this term is an acronym. Now that we've created a more flexible XML document, we can use the key() function to do several useful things:
Thinking back to our earlier discussion, these are all things we can't do with the id() function. If the language, topic, and acronym attributes were defined to be of type ID, only one definition could be written in English, only one definition could apply to the security topic, and only one term could be an acronym. Clearly, that's an unacceptable limitation on our document. 5.2.3. Stylesheets That Use the key() FunctionWe've mentioned some useful things we can do with the key() function, so now we'll build some stylesheets that use it. Our first stylesheet will list all definitions written in a particular language. We'll go through the various parts of the stylesheet, explaining all the things we had to add to make everything work. The first thing we'll do, of course, is define the key() function: <xsl:key name="language-index" match="defn" use="@language"/> Notice that the match attribute we used was the simple element name defn. This tells the XSLT processor to match all <defn> elements at all levels of the document. Because of the structure of our document, we could have written match="/glossary/glentry/defn", as well. Although this XPath expression is more restrictive, it matches the same elements because all <defn> elements must appear inside <glentry> elements, which in turn appear inside the <glossary> element. Next, we set up our stylesheet to determine what value of the language attribute we're searching for. We'll do this with a global <xsl:param> element: <xsl:param name="targetLanguage"/> Recall from our earlier discussion of the <xsl:param> element that any top-level <xsl:param> is a global parameter to the stylesheet and may be set or initialized from outside the stylesheet. The way to do this varies from one XSLT processor to another. Here's how it's done with Xalan. (The command should be on one line.) java org.apache.xalan.xslt.Process -in moreterms.xml -xsl crossref2.xsl -param targetLanguage it If you use Michael Kay's Saxon processor, the syntax looks like this: java com.icl.saxon.StyleSheet moreterms.xml crossref2.xsl targetLanguage=it Now that we've defined our key() function and defined a parameter to specify which language we're looking for, we need to generate our output. Here's the modified template that generates the HTML <title> and <h1> tags:
There are a couple of significant changes here. When we were using the id() function, it was easy to find the first and last terms in the document. Because we're now trying to list only the definitions that are written in a particular language, that won't work. Reading the XPath expressions in the <xsl:value-of> elements from left to right, we find the first and last <defn> elements returned by the key() function, then use the preceding-sibling axis to reference the <term> element that preceded it. We could also have written our XPath expressions using the ancestor axis:
Now that we've successfully generated the HTML <title> and <h1> elements, we need to process the actual definitions for the chosen language. To do this, we'll use the targetLanguage parameter. Here's how the rest of the template looks:
In this code, we've selected all the values from the language-index key that match the targetLanguage parameter. For each one, we use the ancestor axis to select the <glentry> element. We've already written the templates that process these elements correctly, so we can just reuse them. The final change we make is to select only those <defn> elements whose language attributes match the targetLanguage parameter. We do this with a simple XPath expression: <xsl:apply-templates select="defn[@language=$targetLanguage]"/> Here's the complete stylesheet:
Given our sample document and a targetLanguage of en, we get these results:
Changing the targetLanguage to it, the results are now different:
With this stylesheet, we have a way to create a useful subset of our glossary. Notice that we're still using our original technique of ID, IDREF, and IDREFS to process the <xref> and <seealso> elements. If you want, you could redefine the processing to use the key() function instead. Here's how you'd define a key() function to mimic our earlier use of ID and IDREF:
As an exercise for the reader, you can modify this stylesheet so that it lists only definitions that apply to a particular topic, or only terms that are acronyms. 5.2.3.1. The key() function and the IDREFS datatypeFor all its flexibility, the key() function doesn't support anything like the IDREFS datatype. We can try to use the key() function the same way we used id():
But the <xsl:for-each> doesn't have anything to work with. That's because the key value we're looking for is "wildcard-char DMZlong pattern-matching". When we were dealing with the id() function, this string was broken into three tokens because anything with a datatype of ID can't contain a space. With the key() function, we can search on anything, including the contents of an element. (See Section 5.3, "Generating Links in Unstructured Documents" for an example of this.) For this reason, our call to the key() function asking for all the <term> elements with an id attribute equal to "wildcard-char DMZlong pattern-matching" returns nothing. Any attribute with a datatype of ID can't contain spaces, so we get no results. There are several ways to deal with this problem; we'll go through our choices next. 5.2.3.2. Solution #1: Replace the IDREFS datatypeIf you consider this a problem and refuse to use the id() function, there are several approaches you can take. The most drastic (but probably the simplest to implement) is to not use the IDREFS datatype at all. You could change the <seealso> element so that it contains a list of references to other elements: <seealso> <item refid="wildcard-character"/> <item refid="DMZlong"/> <item refid="pattern-matching"/> </seealso> This approach has the advantage that we can use the value of all the refid attributes of all <item> elements with the key() function. That means we can search on anything, not just values of attributes. The disadvantage, of course, is that we had to change the structure of our XML document to make this approach work. If you have control of the structure of your XML document, that's possible; it's entirely likely, of course, that you can't change the XML document at all. A variation on this approach would be to use a stylesheet to transform the IDREFS datatype into the previous structure. 5.2.3.3. Solution #2: Use the XPath contains() functionA second approach is to leave the structure of the XML document unchanged, then use the XPath contains() function to find all <term> elements whose id attributes are contained in the value of the refids attribute of the <seealso> element. Here's how that would work:
We've done a couple of things here: First, we've saved the value of the refids attribute of the <seealso> element in the variable id_list. That's because we can't access it within the <for-each> element. We can find a given <seealso> element from within a given <term> element, but it's too difficult to find that element generically from every <term> element. The simplest way to find the element is to save the value in a variable. Second, we look at all of the <term> elements in the document. For each one, if our variable (containing the refids attribute of the <seealso> element) contains the value of the current <term> element's id attribute, then we process that <term> element. Here are the results our stylesheet generates:
There are a couple of problems here. The most mundane is that in our stylesheet, we don't know how many <term> elements have id attributes contained in our variable. That means it's difficult to insert commas correctly between the matching <term>s. In the output here, we were lucky that the last match was in fact the last term, so the results here are correct. For any <seealso> element whose refid attribute doesn't contain the id attribute of the last <term> element in the document, this stylesheet won't work. The more serious problem is that one of the matches is, in fact, wrong. If you look closely at the output, we get a match for the term DMZ, even though there isn't an exact match for its id in our variable. That's because the XPath contains() function says (correctly) that the value DMZlong contains the ids DMZlong and DMZ. So our second attempt at solving this problem doesn't require us to change the structure of the XML document, but in this case, we have to change some of our IDs so that the problem we just mentioned doesn't occur. That's probably going to be a maintenance nightmare and a serious drawback to this approach. 5.2.3.4. Solution #3: Use recursion to process the IDREFS datatypeHere we use a recursive template to tokenize the refids attribute into individual IDs, then process each one individually. This style of programming takes a while to get used to, but it can be fairly simple. Here's the crux of our stylesheet:
The first thing we did was invoke the named template resolveIDREFS in the template for the <seealso> element. While invoking the template, we pass in the value of the refids attribute and let recursion work its magic. The resolveIDREFS template works like this:
One technique in particular is worth mentioning here: the way we handled whitespace in the attribute value. We pass the string we want to tokenize as a parameter to the template, but we need to normalize the whitespace. We use two XPath functions to do this: normalize-space() and concat(). The call looks like this:
The normalize-space() function removes all leading and trailing whitespace from a string and replaces internal whitespace characters with a single space. Remember that whitespace inside an attribute isn't significant; our <seealso> element could be written like this: <seealso refids=" wildcard-char DMZlong pattern-matching "/> When we pass this attribute to normalizeSpace(), the returned value is wildcard-char DMZlong pattern-matching. All whitespace at the start and end of the value has been removed and all the whitespace between characters has been replaced with a single space. Because we're using the substring-before() and substring-after() functions to find the first token and the rest of the string, it's important that there be at least one space in the string. (It's possible, of course, that an IDREFS attribute contains only one ID.) We use the concat() function to add a space to the end of the string. When the string contains only that space, we know we're done. Although this approach is more tedious, it does everything we need it to do. We don't have to change our XML document, and we correctly resolve all the IDs in the IDREFS datatype. 5.2.3.5. Solution #4: Use an extension functionThe final approach is to write an extension function that tokenizes the refids attribute and returns a node-set containing all id values we need to search for. Xalan ships with an extension that does just that. We invoke the extension function on the value of the refids attribute, then use a <xsl:for-each> element to process all items in the node-set. We'll cover extension functions in Chapter 8, "Extending XSLT", but for now, here's what the stylesheet looks like:
In this case, the tokenize function (defined in the Java class org.apache.xalan.lib.Extensions) takes a string as input, then converts the string into a node-set in which each token in the original string becomes a node. Be aware that using extension functions limits the portability of your stylesheets. The extension function here does what we want, but we couldn't use this extension function with Saxon, XT, or the XSLT tools from Oracle or Microsoft. They may or may not supply similar functions, and if they do, you'll have to modify your stylesheet slightly to use them. If it's important to you that you be able to switch XSLT processors at some point in the future, using extensions will limit your ability to do that. Hopefully at this point you're convinced of at least one of the following two things: 5.2.4. Advantages of the key() FunctionNow that we've taken the key() function through its paces, you can see that it has several advantages:
Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|
|