6.6. Transformation and Presentation
The
markup in a
typical XML document describes the
document's structure, but it tends not to describe
the document's presentation. That is, it says how
the document is organized but not how it looks. Although XML
documents are text, and a person could read them in native form if
they really wanted to, much more commonly an XML document is rendered
into some other format before being presented to a human audience.
One of the key ideas of markup languages in general and XML in
particular is that the input format need not be the same as the
output format. To put it another way, what you see is not what you
get, nor is it what you want to get. The input
markup language is designed for the
convenience of the writer. The output language is designed for the
convenience of the reader.
Of course this requires a means of transforming the input format into
the output format. Most XML documents undergo some kind of
transformation before being presented to the reader. The
transformation may be to a different XML vocabulary like XHTML or
XSL-FO, or it may be to a non-XML format like PostScript or RTF.
XML's semiofficial transformation language is
Extensible
Stylesheet Language Transformations
(XSLT). An XSLT document contains a
list of template rules. Each template rule has a pattern noting which
elements and other nodes it matches. An XSLT processor reads the
input document. When it sees something in the input document that
matches a template rule in the stylesheet, it outputs the template
rule's template. Part of the template is normally an
instruction that tells the processor to include content from the
input in the output. This allows, for example, the text of the output
document to be the same while all the markup is changed. For
instance, you could write a stylesheet that would transform DocBook
documents into TEI documents. XSLT will be discussed in much more
detail in Chapter 8.
However, XSLT is not the only transformation language you can use
with your XML documents. Other stylesheet languages such as the
Document
Style Sheet and Semantics Language (DSSSL, http://www.jclark.com/dsssl/) are also
available. So are a variety of proprietary tools like OmniMark
(http://www.omnimark.com/). Most
of these have particular strengths and weaknesses for particular
kinds of documents. Custom programs written in a variety of
programming languages, such as Java, C++, Perl, and Python, can use a
plethora of APIs, such as SAX, DOM, and JDOM, to transform documents.
This is sometimes useful when you need something more than a mere
transformation--for instance, interpreting certain elements as
database queries and actually inserting the results of those queries
into the output document, or asking the user to answer questions in
the middle of the transformation. However, the biggest single factor
when choosing which tool to use is simply which language and syntax
you're most comfortable with. De linguis
non disputandum est.
There are many different choices for the output format from a
transformation. A PostScript file can be printed on paper, overhead
transparencies, slides, or even T-shirts. A PDF document can be
viewed in all these ways and shown on the screen as well. However,
for screen display, PDF is vastly inferior to simple HTML, which has
the advantages of being very broadly accessible across platforms and
being very easy to generate via XSLT from source XML documents.
Generating a PDF or a PostScript file normally requires an additional
conversion step in which special software converts some custom XML
output format like XSL-FO to what you actually want.
An alternative to a transformation-based presentation is to provide a
descriptive stylesheet that simply states how each element in the
original document should be formatted. This is the realm of
Cascading Style Sheets (CSS).
This works particularly well for narrative documents where all
that's needed is a list of the fonts, styles, sizes,
and so on to apply to the content of each element. The key is that
when all markup is stripped from the document, what remains is more
or less a plain-text version of what you want to see. No reordering
or rearrangement is necessary. This approach works less well for
data-oriented documents where the raw content may be nothing more
than an undifferentiated mass of numbers, dates, or other information
that's hard to understand without the context and
annotations provided by the markup. However, in this case a
combination of the two approaches works well. First a transformation
can produce a new document containing rearranged and annotated
information. Then a CSS stylesheet can apply style rules to the
elements in this transformed document.
 |  |  | 6.5. Document Permanence |  | 7. XML on the Web |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|