1.8. What XML Are We Talking About?
Over the past years, there has been an explosive growth in the number of XML-related standards. Talking about XML has become confusing, because those three letters can mean so many different things. Some people actually mean what I've called "Greater XML." Think of it this way: Boston is significant city, but people who don't live there may often name Boston to refer to other nearby towns (Arlington, Cambridge, and so on). What they're really talking about is the "Greater Boston Metropolitan Area," or sometimes even just "Eastern Massachusetts."
In much the same way, many people now talk about "XML" when they really mean one of dozens of related technologies built around the nucleus of XML. Some of these may even be part of the original XML vision as "SGML for the Web." Using XML to develop documents using a DTD like DocBook (http://www.docbook.org) is clearly part of that original open systems vision. However, it's also been trendy to market "new and improved!" software as based on XML. Such ambiguities can be confusing and can even implicitly promote vendor lock-in, rather than liberate customer data from vendor control. The simplicity at the core of XML isn't friendly to lock-in strategies, but complex application layers on top of XML can certainly cause closed systems.
So when someone says that SAX is a great API for XML processing, exactly what part of Greater XML does that mean? Briefly, parts built with the "core" XML specifications. The following lists shows the parts that this book uses in most of its examples.
Over time, some other simple layers (and conventions) may become appropriate to view as part of the core of XML. The XML Base specification (http://www.w3.org/TR/xml-base/) might be an example of such a facility; it explains how to use an xml:base attribute to augment normal processing of relative URIs found in text. Various internationalization rules and policies are also likely to fit into that core. One example is W3C work on the Character Model for the World Wide Web (http://www.w3.org/TR/charmod/), which promotes uniform handling of sequences used to represent some non-ASCII characters. Another is currently called XML Blueberry, which will modify XML 1.0 to allow use of new Unicode characters in element and attribute names. Those characters support languages not previously supported (before Unicode 3.1) and also improve support for languages such as Japanese.
Many of the increasingly substantial layers over XML, such as schemas (there are many schema approaches, with one from W3C), schema APIs and tools (which may focus on non-XML data models, distant from "downtown XML"), Remote Procedure Calls ("RPCs"; again, many approaches including one from W3C), XPath (and its outgrowths), and XSLT are prime examples of technologies that deserve to be viewed as technology choices in their own right. They are other cities in the metropolis of Greater XML, satellites of the original village that leverage the original civic infrastructure. Some of those layers may even reflect different fundamental goals and requirements from those that originally drove the creation and adoption of XML. That doesn't mean that you won't put SAX interfaces on them (or at least SAX-friendly ones), but because they are data layers over the core of XML, they may involve API layers too.
If you look at Java implementations of other technologies in Greater XML, you'll probably find SAX not far from the surface. This book identifies a number of such SAX-based tools and shows SAX events used as a framework to efficiently integrate these different technologies.
Copyright © 2002 O'Reilly & Associates. All rights reserved.