Before we leave the topic of linking, we'll discuss one more useful technique. So far, all of this chapter's examples have been structured nicely. When there was a relationship between two pieces of information, we had an id and refid pair to match them. What happens if the XML document you're transforming isn't written that way? Fortunately, we can use the key() function and a new function, generate-id(), to create structure where there isn't any.
5.3.1. An Unstructured XML Document in Need of Links
For our example here, we'll take out all of the id and refid attributes that have served us well so far. This may be a contrived example, but it demonstrates how we can use the key() and generate-id() functions to generate links between parts of our document.
In our new sample document, we've stripped out the references that neatly tied things together before:
<?xml version="1.0" ?>
<!DOCTYPE glossary SYSTEM "unstructuredglossary.dtd">
<glossary>
<glentry>
<term>applet</term>
<defn>
An application program,
written in the Java programming language, that can be
retrieved from a web server and executed by a web browser.
A reference to an applet appears in the markup for a web
page, in the same way that a reference to a graphics
file appears; a browser retrieves an applet in the same
way that it retrieves a graphics file.
For security reasons, an applet's access rights are limited
in two ways: the applet cannot access the file system of the
client upon which it is executing, and the applet's
communication across the network is limited to the server
from which it was downloaded.
Contrast with <refterm>servlet</refterm>.
</defn>
</glentry>
<glentry>
<term>demilitarized zone</term>
<defn>
In network security, a network that is isolated from, and
serves as a neutral zone between, a trusted network (for example,
a private intranet) and an untrusted network (for example, the
Internet). One or more secure gateways usually control access
to the DMZ from the trusted or the untrusted network.
</defn>
</glentry>
<glentry>
<term>DMZ</term>
<defn>
See <refterm>delimitarized zone</refterm>.
</defn>
</glentry>
<glentry>
<term>pattern-matching character</term>
<defn>
A special character such as an asterisk (*) or a question mark
(?) that can be used to represent zero or more characters.
Any character or set of characters can replace a pattern-matching
character.
</defn>
</glentry>
<glentry>
<term>servlet</term>
<defn>
An application program, written in the Java programming language,
that is executed on a web server. A reference to a servlet
appears in the markup for a web page, in the same way that a
reference to a graphics file appears. The web server executes
the servlet and sends the results of the execution (if there are
any) to the web browser. Contrast with <refterm>applet</refterm>.
</defn>
</glentry>
<glentry>
<term>wildcard character</term>
<defn>
See <refterm>pattern-matching character</refterm>.
</defn>
</glentry>
</glossary>
To generate cross-references between the <refterm> elements and the associated <term> elements, we'll need to do three things:
-
Define a key for all terms. We'll use this key to find terms that match the text of the <refterm> element.
-
Generate a new ID for each <term> we find.
-
For each <refterm>, use the key() function to find the <term> element that matches the text of <refterm>. Once we've found the matching <term>, we call generate-id() to find the newly created ID.
We'll go through the relevant parts of the stylesheet. First, we define the key:
<xsl:key name="terms" match="term" use="."/>
Notice that we use the value of the <term> element itself as the lookup value for the key. Given a string, we can find all <term> elements with that same text.
Second, we need to generate a named anchor point for each <term> element:
<xsl:template match="glentry">
<p>
<b>
<a name="{generate-id(term)}">
<xsl:value-of select="term"/>
<xsl:text>: </xsl:text>
</a>
</b>
<xsl:apply-templates select="defn"/>
</p>
</xsl:template>
Third, we find the appropriate reference for a given <refterm>. Given the text of a <refterm>, we can use the key() function to find the <term> that matches. Passing the <term> to the generate-id() function returns the same ID generated when we created the named anchor for that <term>:
<xsl:template match="refterm">
<a href="#{generate-id(key('terms', .))}">
<xsl:value-of select="."/>
</a>
</xsl:template>
Our generated HTML output creates cross-references similar to those in our earlier stylesheets:
<h1>Glossary Listing: applet - wildcard character</h1>
<p>
<b><a name="N11">applet: </a></b>
An application program,
written in the Java programming language, that can be
retrieved from a web server and executed by a web browser.
A reference to an applet appears in the markup for a web
page, in the same way that a reference to a graphics
file appears; a browser retrieves an applet in the same
way that it retrieves a graphics file.
For security reasons, an applet's access rights are limited
in two ways: the applet cannot access the file system of the
client upon which it is executing, and the applet's
communication across the network is limited to the server
from which it was downloaded.
Contrast with <a href="#N53">servlet</a>.
</p>
...
<p>
<b><a name="N53">servlet: </a></b>
An application program, written in the Java programming language,
that is executed on a web server. A reference to a servlet
appears in the markup for a web page, in the same way that a
reference to a graphics file appears. The web server executes
the servlet and sends the results of the execution (if there are
any) to the web browser. Contrast with <a href="#N11">applet</a>.
</p>
Using the key() and generate-id() functions, we've been able to create IDs and references automatically. This approach isn't perfect; we have to make sure the text of the <refterm> element matches the text of the <term> exactly.
This example, like all of the examples we've shown so far, uses a single input file. A more likely scenario is that we have one XML document that contains terms, and we want to reference definitions in a second XML document that contains definitions, but no IDs. We can combine the technique we've described here with the document() function to import a second XML document and generate links between the two. We'll talk about the document() function in a later chapter; for now, just remember that there are ways to use more than one XML input document in your transformations.