1.6. Some Popular SAX2 Parser Distributions
Today a variety of high-quality SAX2 parsers are available.
Increasingly, they are packaged with Java programming environments,
so you may not need to fetch one yourself unless you need upgrades
(or bug fixes), or are constructing such a programming environment
yourself (perhaps packaging an embedded system or a standalone
application).
You should be able to bootstrap any SAX parser.
As a rule, if an XML parser is part of your Java programming
environment, it already supports SAX and probably SAX2.
The documentation should say whether SAX2 is supported.
If it only mentions SAX1, you can upgrade to get most of
the core SAX2 features; see Section 5.2, "SAX1 Support ", in Chapter 5, "Other SAX Classes", for more information.
If your programming environment doesn't include a SAX parser,
you'll need to get and install one.
This section provides a brief summary of some of the
most widely available open source SAX2 parsers.[5] These packages all include SAX2, DOM Level 2, and JAXP 1.1 support, and can validate XML for you. They also have full support for the standard SAX2 extensions. If you don't happen to download documentation that includes the SAX2 documentation, it'll be available from the same site as the parser. All of these perform well in most applications, as long as you avoid the memory penalties of DOM.
Current versions of all these parsers do quite well on the
open source SAX/XML conformance tests, available at
http://xmlconf.sourceforge.net/java/.
Those tests verify that these processors report essential information
required of a SAX1 processor, and evaluate how
well they support the XML 1.0 specification.
SAX2 conformance testing isn't yet as well advanced, though some tests are now available.
In addition to a SAX2 parser, you will likely want to have
some SAX2/XML utilities that are layered on top of that parser.
The packages described here include a DOM implementation, which is
normally provided as a clean layer over SAX2.
You might also consider other more Java-friendly packages such as
DOM4J (http://www.dom4j.org)
or
JDOM (http://www.jdom.org),
both of which are layered over SAX2, as well as
other APIs that provide more data-structure options.
When you're learning SAX, having access to the source code
of tools and applications built with SAX can help you
learn the API, at least if it's high-quality source that
uses the SAX APIs correctly.
1.6.1. Ælfred2
One of the original XML parsers mentioned earlier,
Ælfred, has long been recognized for its simplicity,
small size, and good performance.
As XML parsers go, it is easy to read and understand.
With a different maintainer (your humble author),
this parser was updated
to be the first with full native SAX2 support, and to
substantially improve its conformance to the XML specification.
This updated version is called Ælfred2, and versions have been
incorporated in a variety of applications where its simplicity,
size, and conformance are compelling features.
It is now part of the GNU Classpath Extensions project
and forms the core of the GNU JAXP library.
The updated version has taken SAX2 further
than most other parsers. It has a highly modular structure;
the reference distribution is able to use an optional
"stream validator" that uses the SAX2 events.
The model of an XML pipeline of such events is a
natural and powerful way to think about SAX; the SAX2 pipeline
package in this distribution lets applications compose arbitrary
processing modules in series or parallel.
This style of SAX2 processing is emphasized in this book,
and some of the examples show how to use these advanced
components.
Validation and DOM support remain completely modular, and use
SAX event pipelines, so Ælfred can still be distributed as a
lightweight nonvalidating parser without those components.
Likewise, the validation and DOM support don't need Ælfred to work.
The current version of Ælfred is licensed under the
GNU General Public License (GPL), with the
"library exception" clause to ensure that it can be used in
proprietary applications (notably, embedded systems)
that aren't themselves licensed under the GPL.
That license is used with many GNU libraries, such as the
GCC Java (GCJ) runtime libraries.
Ælfred includes a gnujaxp.jar
file that needs installation.
See http://www.gnu.org/software/classpathx/jaxp/
for information about the current distribution of Ælfred.
1.6.3. Xerces
Xerces is a family of XML parsers in the Apache XML
project; in this book, we refer only to the Java version,
not the C/C++ version.
It has evolved from the second generation of
IBM's XML for Java (XML4J) parser, and much of its
development and maintenance is still handled by IBM.
It is relatively large, and is monolithic rather than
modular. It also supports many nonstandard extensions.
For example, validation against W3C's XML schemas is
part of the parser, rather than a layered feature.
Xerces v2 is a third-generation
project. Goals of that project include a
more maintainable and modular design.
It includes an internal XML event pipeline model, which is
strikingly similar to that used in Ælfred to layer validation
and DOM support, except that it doesn't use SAX2 to
represent the XML Infoset data.
Xerces is licensed under the Apache Software License.
This book describes Xerces Version 1.4.3, dated August 2001,
which includes a xerces.jar file that needs installation.
See http://xml.apache.org/
for information about this distribution.
 |  |  | 1.5. Packages in the SAX2 API |  | 1.7. Installing a SAX2 Parser |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|