It's good policy to reuse parsers, rather than
constantly discard and recreate them.
Some parsers are more expensive to create than others,
so such reuse can improve performance if you parse
many documents. Similarly, factory approaches add some fixed
costs to achieve vendor neutrality, and those costs can add up.
In contexts like servlets, where any number of
threads may need to parse XML concurrently, parsers are often pooled so those
bootstrapping costs won't increase per-request service times.
3.2.3. Using JAXP
Sun's JAXP 1.1 supports yet another way to bootstrap SAX parsers.
It's a more complex process, taking several steps instead of
just one:
First, get a
javax.xml.parsers.SAXParserFactory.
Tell it to return parsers that will
do the kind of processing needed by your application.
Ask it to give you a JAXP parser of type
javax.xml.parsers.SAXParser.
Finally, ask the JAXP parser to give
you the XMLReader that is
normally lurking inside of it.
Conceptually this is like the no-parameters
XMLReaderFactory.createXMLReader() method,
except it's complicated by expecting the factory to return
preconfigured parsers.[16]
Configuring the parser using the SAX2 flags and properties
directly is preferable; the API "surface area" is smaller.
Other than having different default namespace-processing modes,
the practical difference is primarily availability: many implementations ensure that a JAXP system
default is always accessible, but they haven't paid the same
attention to providing the default SAX2 parser.
(Current versions of the SAX2 classes make that
easier, but you might not be using such versions.)
The code to use
the JAXP bootstrap API to get a SAX2 parser looks like this:
import org.xml.sax.*;
import javax.xml.parsers.*;
XMLReader parser;
try {
SAXParserFactory factory;
factory = SAXParserFactory.newInstance ();
factory.setNamespaceAware (true);
parser = factory.newSAXParser ().getXMLReader ();
// success!
} catch (FactoryConfigurationError err) {
System.err.println ("can't create JAXP SAXParserFactory, "
+ err.getMessage ());
} catch (ParserConfigurationException err) {
System.err.println ("can't create XMLReader with namespaces, "
+ err.getMessage ());
} catch (SAXException err) {
System.err.println ("Hmm, SAXException, " + err.getMessage ());
}
Rather than calling newInstance(),
you can hardcode the constructor for a particular factory,
probably using one of the classes listed in
Table 3-2.
It's better to keep implementation preferences as configuration
issues though, and not hardwire them into source code.
For situations where you may have several parsers in your
class path (or a tree of class loaders, as found in many
recent servlet engines),
JAXP offers several methods to configure such preferences.
You
can associate the factory class name value with the key
javax.xml.parsers.SAXParserFactory by
using the key to name a system property (which sets the default parser
for your JVM instance) or by putting it in the
$JAVA_HOME/jre/lib/jaxp.properties
property file (which sets the default policy for that
JVM implementation).
I prefer the jaxp.properties solution;
with the other method the default parser is a function
of your class path settings and even the names assigned
to various JAR files. You can also embed this preference in your
application's JAR files as a META-INF/services/... file, but that solution is similarly sensitive to class loader
configuration issues.
Table 3-2. JAXP SAXParserFactory implementation classes
JAXP factory |
Class name |
Ælfred |
gnu.xml.aelfred2.JAXPFactory |
Crimson |
org.apache.crimson.jaxp.SAXParserFactoryImpl |
Xerces |
org.apache.xerces.jaxp.SAXParserFactoryImpl |
If you're using JAXP to bootstrap a SAX2
parser, rather than the SAX2 APIs, the default setting for
namespace processing is different: JAXP parsers don't process
namespaces by default, while SAX2 parsers do.
SAX2 normally removes all
xmlns* attributes, reports namespace scope
events, and may hide the namespace prefixes actually used
by element and attribute names.
JAXP does none of that unless you make it; in fact,
the default parser mode for some current implementations is
the illegal SAX2 mode described in the previous chapter.
The example code in this section made the JAXP factory follow
SAX2 defaults.
This book encourages you to use SAX2 directly,
rather than through the JAXP factory mechanism.
Even if JAXP is available, it's more complex to use.
Also, the resulting parser is configured differently,
so many of the examples in this book would break.