Configuring XMLReader Behavior (SAX2)

3.3.1. XMLReader Properties

SAX2 defines two XMLReader calls for accessing named property objects. One of the most common uses for such objects is to install non-core event handlers. Accessing properties is like accessing feature flags, except that the values associated with these names are objects rather than Booleans:

XMLReader       producer ...;
String          uri = ...;
Object          value = ...;

// Try getting and setting the property
try {
    System.out.println ("Initial property setting: "
	+ producer.getProperty (uri);
    // if we get here, the property is supported

    producer.setProperty (uri, value);
    // if we get here, the parser set the property 

} catch (SAXNotSupportedException e) {
    // bad value for property ... maybe wrong type, or parser state
    System.out.println ("Can't set property: "
	+ e.getMessage ());
    System.exit (1);

} catch (SAXNotRecognizedException e) {
    // property not supported by this parser
    System.out.println ("Doesn't understand property: "
	+ e.getMessage ());
    System.exit (1);
}

You'll notice the URIs for these standard properties happen to have a common prefix. This means that you can declare the prefix (http://xml.org/sax/properties/) as a constant string and construct the identifiers by string catenation.

Here are the standard properties:

http://xml.org/sax/properties/declaration-handler

This property holds an implementation of org.xml.sax.ext.DeclHandler, used for reporting the DTD declarations that aren't reported through org.xml.sax.DTDHandler callbacks or for the root element name declaration, org.xml.sax.ext.LexicalHandler callbacks. This handler is presented in Section 4.3.1, "The DeclHandler Interface ".

Ælfred, Crimson, and Xerces support this property. In fact, all JAXP-compliant processors must do so.

http://xml.org/sax/properties/dom-node

Only specialized parsers will support this property: parsers that traverse DOM document nodes to produce streams of corresponding SAX events. (Typical SAX2 parsers parse XML text instead of DOM content.) When read, this property returns the DOM node corresponding to the current SAX2 callback. The property can only be written before a parse, to specify that the DOM node beginning and ending the SAX event stream need not be a org.w3c.dom.Document. This type of parser is presented later in this chapter, in Section 3.5.1, "DOM-to-SAX Event Production (and DOM4J, JDOM)".

One example of such a parser is gnu.xml.util.DomParser, which is currently packaged along with the Ælfred parser. At this time, neither Crimson nor Xerces include such functionality.

http://xml.org/sax/properties/lexical-handler

This property holds an implementation of org.xml.sax.ext.LexicalHandler, used for reporting various events mostly (but not exclusively) relating to details of XML text that have no semantic or structural meaning, such as comments. This handler is presented in Chapter 4, "Consuming SAX2 Events" in Section 4.2, "The LexicalHandler Interface ".

Ælfred, Crimson, and Xerces support this property. In fact, all JAXP-compliant processors must do so.

http://xml.org/sax/properties/xml-string

This property returns a literal string of characters associated with the current parser callback event. Exactly which characters are returned isn't specified by SAX2. An example would be returning all the characters in the start tag of an element, including unexpanded entity and character references as well as excess whitespace and the exact type of quote characters (single, double) used to delimit attribute values. (This feature is intended to be of use when constructing certain kinds of XML editors, or DTD analyzers, that are willing to re-parse this data.)

No widely available open source SAX2 parser currently supports this property.

Applications may find it useful to define their own types of handler interfaces, assembling sequences of SAX event "atoms" into higher-level event "molecules" that incorporate essential application-level semantics (and probably some procedural validation). This is the same kind of process model used by W3C's XML schema processing model: the Post-Schema-Validation Infoset (PSVI) additions incorporate semantics suited to processing with that kind of schema. Most applications need to associate even more semantics with data than are easily captured by such simple rules (including DTDs and all types of schema). Those semantics would likely not be understood by any common XMLReader, but other kinds of SAX processing components can help manage such application-level handlers. You can see an example of this technique in Example 6-3.

3.3. Configuring XMLReader Behavior

3.3.1. XMLReader Properties

3.3.2. XMLReader Feature Flags