1.8. What XML Are We Talking About?
Over the past years, there has been an explosive growth
in the number of XML-related standards. Talking about XML has
become confusing, because those three letters can mean so many different
things. Some people actually mean what I've called
"Greater XML." Think of it this way:
Boston is significant city, but people who don't live there may
often name Boston to refer to other nearby towns (Arlington,
Cambridge, and so on). What they're really talking about is the
"Greater Boston Metropolitan Area," or sometimes even just
"Eastern Massachusetts."
In much the same way, many people now talk about "XML"
when they really mean one of dozens of related
technologies built around the nucleus of XML. Some of these
may even be part of the original XML vision as "SGML for the
Web." Using XML to develop documents using a DTD like DocBook
(http://www.docbook.org)
is clearly part of that original open systems vision.
However, it's also been trendy to market "new and improved!"
software as based on XML. Such ambiguities can be confusing and can even implicitly promote vendor lock-in, rather than liberate customer data from vendor control. The simplicity at the core of XML isn't friendly to lock-in strategies, but complex application layers on top of XML can certainly cause closed systems.
So when someone says that SAX is a great API for
XML processing, exactly what part of Greater XML does that mean?
Briefly, parts built with the "core" XML specifications.
The following lists shows the parts that this book uses in most of its examples.
-
XML 1.0 (Second Edition)
-
http://www.w3.org/TR/REC-xml
This text document format is the core of XML.
SAX2 parsers work with this format and turn it into a
stream of
events that present the XML Infoset.
However, as we'll see, SAX can
be quite useful without even parsing XML text.
(The second edition incorporates a variety of bug fixes
and a few functional changes, which were previously
published as a separate list of errata.)
XML includes Document Type
Declarations,
or DTDs. These provide several processing facilities, most
of which you can rely on even when you don't use a
validating parser. All XML parsers must support DTDs;
they're what "schema" technologies attempt to improve on.
Unicode support has been part of XML from the
earliest days.
Java programmers may tend to overlook the significance of
that fact, since it's always been part of Java too.
But it's actually a big deal that XML moves web
technologies firmly away from ASCII toward Unicode,
in all programming environments (not just Java) -- not
everyone needs to be a native English speaker to make
best use of Internet technologies.
XML has even been called a "virus for Unicode."
-
XML Infoset
-
http://www.w3.org/TR/xml-infoset/
The Infoset is best explained
as an abstract model for what XML represents:
information like elements, attributes, and character data.
The Infoset exposes XML structure, not meaningful data. Applications transform Infoset data into forms that are suited to their particular tasks, normally behind a veil of application objects, unless they manipulate the text like a text editor.
The SAX2 event APIs present Infoset-level data;
the lower-level alternative is to work directly with
text.
(See Appendix B, "SAX2 and the XML Infoset" for details about
Infoset support in SAX2.)
Other XML infrastructure, such as
XInclude, generally transforms
or augments Infoset data.
Higher-level APIs generally hide such XML structures.
-
XML Namespaces
-
http://www.w3.org/TR/REC-xml-names/
Namespaces are an optional convention
for XML 1.0 documents.
Namespaces distinguish elements and attributes
so that names can
be reused when necessary. For example, in document
markup a
<table> probably refers to a tabular
presentation of data, but in a furniture catalog it might
also refer to something rather different.
XML namespaces distinguish those cases with name prefixes;
unlike "straight XML" with DTDs, those prefixes are
expected to change in different contexts
(such as different parts of that furniture catalog).
This makes combining namespaces and DTDs complicated.
One of the most visible differences between SAX1
and SAX2 is that SAX2 has integrated support for XML
namespaces to promote their widespread adoption.
Over time, some other simple layers (and conventions)
may become appropriate to view as part of the core of XML.
The XML Base specification
(http://www.w3.org/TR/xml-base/)
might be an example of such a facility; it explains how to
use an xml:base attribute to augment
normal processing of relative URIs found in text.[8]
Various internationalization rules and policies are also
likely to fit into that core.
One example is W3C work on the Character Model for the World Wide Web
(http://www.w3.org/TR/charmod/),
which promotes uniform handling of sequences used to
represent some non-ASCII characters.
Another is currently called XML Blueberry,
which will modify XML 1.0 to allow use of new Unicode characters
in element and attribute names.
Those characters support languages not previously
supported (before Unicode 3.1) and also
improve support for languages such as Japanese.
Many of the increasingly substantial layers over XML, such as
schemas (there are many schema approaches, with one from W3C),
schema APIs and tools (which may focus on non-XML data models,
distant from "downtown XML"),
Remote Procedure Calls ("RPCs"; again, many approaches
including one from W3C),
XPath (and its outgrowths),
and XSLT
are prime examples of technologies that deserve to be viewed
as technology choices in their own right.
They are other cities in the
metropolis of Greater XML, satellites of the original village
that leverage the original civic infrastructure.
Some of those layers may even reflect different fundamental
goals and requirements from those that originally drove the
creation and adoption of XML.
That doesn't mean that you won't put SAX interfaces on them
(or at least SAX-friendly ones), but because they are data
layers over the core of XML, they may involve API layers too.
If you look at Java implementations of other technologies in
Greater XML, you'll probably find SAX not far from the surface.
This book identifies a number of such SAX-based tools and
shows SAX events used as a framework to efficiently
integrate these different technologies.
 |  |  | 1.7. Installing a SAX2 Parser |  | 2. Introducing SAX2 |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|