Generating Links in Unstructured Documents (XSLT)

5.3.1. An Unstructured XML Document in Need of Links

For our example here, we'll take out all of the id and refid attributes that have served us well so far. This may be a contrived example, but it demonstrates how we can use the key() and generate-id() functions to generate links between parts of our document.

In our new sample document, we've stripped out the references that neatly tied things together before:

<?xml version="1.0" ?>
<!DOCTYPE glossary SYSTEM "unstructuredglossary.dtd">
<glossary>
  <glentry>
    <term>applet</term>
    <defn>
      An application program,
      written in the Java programming language, that can be 
      retrieved from a web server and executed by a web browser. 
      A reference to an applet appears in the markup for a web 
      page, in the same way that a reference to a graphics
      file appears; a browser retrieves an applet in the same 
      way that it retrieves a graphics file. 
      For security reasons, an applet's access rights are limited
      in two ways: the applet cannot access the file system of the 
      client upon which it is executing, and the applet's 
      communication across the network is limited to the server 
      from which it was downloaded. 
      Contrast with <refterm>servlet</refterm>.
    </defn>
  </glentry>

  <glentry>
    <term>demilitarized zone</term>
    <defn>
      In network security, a network that is isolated from, and 
      serves as a neutral zone between, a trusted network (for example, 
      a private intranet) and an untrusted network (for example, the

      Internet). One or more secure gateways usually control access 
      to the DMZ from the trusted or the untrusted network.
    </defn>
  </glentry>

  <glentry>
    <term>DMZ</term>
    <defn>
      See <refterm>delimitarized zone</refterm>.
    </defn>
  </glentry>

  <glentry>
    <term>pattern-matching character</term>
    <defn>
      A special character such as an asterisk (*) or a question mark 
      (?) that can be used to represent zero or more characters. 
      Any character or set of characters can replace a pattern-matching 
      character.
    </defn>
  </glentry>

  <glentry>
    <term>servlet</term>
    <defn>
      An application program, written in the Java programming language, 
      that is executed on a web server. A reference to a servlet 
      appears in the markup for a web page, in the same way that a 
      reference to a graphics file appears. The web server executes
      the servlet and sends the results of the execution (if there are
      any) to the web browser. Contrast with <refterm>applet</refterm>.
    </defn>
  </glentry>

  <glentry>
    <term>wildcard character</term>
    <defn>
      See <refterm>pattern-matching character</refterm>.
    </defn>
  </glentry>
</glossary>

To generate cross-references between the <refterm> elements and the associated <term> elements, we'll need to do three things:

Define a key for all terms. We'll use this key to find terms that match the text of the <refterm> element.
Generate a new ID for each <term> we find.
For each <refterm>, use the key() function to find the <term> element that matches the text of <refterm>. Once we've found the matching <term>, we call generate-id() to find the newly created ID.

We'll go through the relevant parts of the stylesheet. First, we define the key:

<xsl:key name="terms" match="term" use="."/>

Notice that we use the value of the <term> element itself as the lookup value for the key. Given a string, we can find all <term> elements with that same text.

Second, we need to generate a named anchor point for each <term> element:

<xsl:template match="glentry">
  <p>
    <b>
      <a name="{generate-id(term)}">
        <xsl:value-of select="term"/>
        <xsl:text>: </xsl:text>
      </a>
    </b>
    <xsl:apply-templates select="defn"/>
  </p>
</xsl:template>

Third, we find the appropriate reference for a given <refterm>. Given the text of a <refterm>, we can use the key() function to find the <term> that matches. Passing the <term> to the generate-id() function returns the same ID generated when we created the named anchor for that <term>:

<xsl:template match="refterm">
  <a href="#{generate-id(key('terms', .))}">
    <xsl:value-of select="."/>
  </a>
</xsl:template>

Our generated HTML output creates cross-references similar to those in our earlier stylesheets:

    <h1>Glossary Listing: applet - wildcard character</h1>
    <p>
        <b><a name="N11">applet: </a></b>
  An application program,
  written in the Java programming language, that can be 
  retrieved from a web server and executed by a web browser. 
  A reference to an applet appears in the markup for a web 
  page, in the same way that a reference to a graphics
  file appears; a browser retrieves an applet in the same 
  way that it retrieves a graphics file. 
  For security reasons, an applet's access rights are limited
  in two ways: the applet cannot access the file system of the 
  client upon which it is executing, and the applet's 
  communication across the network is limited to the server 
  from which it was downloaded. 
  Contrast with <a href="#N53">servlet</a>.
</p>
...
    <p>
        <b><a name="N53">servlet: </a></b>
  An application program, written in the Java programming language, 
  that is executed on a web server. A reference to a servlet 
  appears in the markup for a web page, in the same way that a 
  reference to a graphics file appears. The web server executes
  the servlet and sends the results of the execution (if there are
  any) to the web browser. Contrast with <a href="#N11">applet</a>.
</p>

Using the key() and generate-id() functions, we've been able to create IDs and references automatically. This approach isn't perfect; we have to make sure the text of the <refterm> element matches the text of the <term> exactly.

This example, like all of the examples we've shown so far, uses a single input file. A more likely scenario is that we have one XML document that contains terms, and we want to reference definitions in a second XML document that contains definitions, but no IDs. We can combine the technique we've described here with the document() function to import a second XML document and generate links between the two. We'll talk about the document() function in a later chapter; for now, just remember that there are ways to use more than one XML input document in your transformations.

5.3. Generating Links in Unstructured Documents

5.3.1. An Unstructured XML Document in Need of Links

5.3.2. The generate-id() Function