Bootstrapping an XMLReader (SAX2)

import org.xml.sax.helpers.XMLReaderFactory; ... XMLReader parser = null; try { parser = XMLReaderFactory.createXMLReader (); // success! } catch (SAXException e) { System.err.println ("Can't get default parser: " + e.getMessage ()); }

import org.xml.sax.helpers.XMLReaderFactory; ... XMLReader parser = null; String className = ...; try { parser = XMLReaderFactory.createXMLReader (className); // success! } catch (SAXException e) { System.err.println ("Can't get default parser: " + e.getMessage ()); }

Parser (and type)

Class name

Ælfred (nonvalidating)

gnu.xml.aelfred2.SAXDriver

Ælfred (optionally validating)

gnu.xml.aelfred2.XmlReader

Crimson (optionally validating)

org.apache.crimson.XmlReaderImpl

Xerces (optionally validating)

org.apache.xerces.parsers.SAXParser

3.2.3. Using JAXP

Sun's JAXP 1.1 supports yet another way to bootstrap SAX parsers. It's a more complex process, taking several steps instead of just one:

First, get a javax.xml.parsers.SAXParserFactory.

Tell it to return parsers that will do the kind of processing needed by your application.

Ask it to give you a JAXP parser of type javax.xml.parsers.SAXParser.

Finally, ask the JAXP parser to give you the XMLReader that is normally lurking inside of it.

Conceptually this is like the no-parameters XMLReaderFactory.createXMLReader() method, except it's complicated by expecting the factory to return preconfigured parsers.[16] Configuring the parser using the SAX2 flags and properties directly is preferable; the API "surface area" is smaller. Other than having different default namespace-processing modes, the practical difference is primarily availability: many implementations ensure that a JAXP system default is always accessible, but they haven't paid the same attention to providing the default SAX2 parser. (Current versions of the SAX2 classes make that easier, but you might not be using such versions.)

[16]You can also look at this as choosing between parsers. For example, JAXP 1.2 will probably say how to request that schema validation be done. That's most naturally done as a layer on top of SAX, with a parser filter postprocessing the output of some other SAX parser.

The code to use the JAXP bootstrap API to get a SAX2 parser looks like this:

import org.xml.sax.*;
import javax.xml.parsers.*;

XMLReader        parser;

try {
    SAXParserFactory factory;

    factory = SAXParserFactory.newInstance ();
    factory.setNamespaceAware (true);
    parser = factory.newSAXParser ().getXMLReader ();
    // success!

} catch (FactoryConfigurationError err) {
    System.err.println ("can't create JAXP SAXParserFactory, "
	+ err.getMessage ());
} catch (ParserConfigurationException err) {
    System.err.println ("can't create XMLReader with namespaces, "
	+ err.getMessage ());
} catch (SAXException err) {
    System.err.println ("Hmm, SAXException, " + err.getMessage ());
}

Rather than calling newInstance(), you can hardcode the constructor for a particular factory, probably using one of the classes listed in Table 3-2. It's better to keep implementation preferences as configuration issues though, and not hardwire them into source code. For situations where you may have several parsers in your class path (or a tree of class loaders, as found in many recent servlet engines), JAXP offers several methods to configure such preferences. You can associate the factory class name value with the key javax.xml.parsers.SAXParserFactory by using the key to name a system property (which sets the default parser for your JVM instance) or by putting it in the $JAVA_HOME/jre/lib/jaxp.properties property file (which sets the default policy for that JVM implementation). I prefer the jaxp.properties solution; with the other method the default parser is a function of your class path settings and even the names assigned to various JAR files. You can also embed this preference in your application's JAR files as a META-INF/services/... file, but that solution is similarly sensitive to class loader configuration issues.

Table 3-2. JAXP SAXParserFactory implementation classes

JAXP factory	Class name
Ælfred	gnu.xml.aelfred2.JAXPFactory
Crimson	org.apache.crimson.jaxp.SAXParserFactoryImpl
Xerces	org.apache.xerces.jaxp.SAXParserFactoryImpl

If you're using JAXP to bootstrap a SAX2 parser, rather than the SAX2 APIs, the default setting for namespace processing is different: JAXP parsers don't process namespaces by default, while SAX2 parsers do. SAX2 normally removes all xmlns* attributes, reports namespace scope events, and may hide the namespace prefixes actually used by element and attribute names. JAXP does none of that unless you make it; in fact, the default parser mode for some current implementations is the illegal SAX2 mode described in the previous chapter. The example code in this section made the JAXP factory follow SAX2 defaults.

This book encourages you to use SAX2 directly, rather than through the JAXP factory mechanism. Even if JAXP is available, it's more complex to use. Also, the resulting parser is configured differently, so many of the examples in this book would break.

3.2. Bootstrapping an XMLReader

3.2.1. The XMLReaderFactory Class

Table 3-1. SAX2 XMLReader implementation classes

3.2.2. Calling Parser Constructors

3.2.3. Using JAXP

Table 3-2. JAXP SAXParserFactory implementation classes