Other SAX Classes (SAX2)

The preceding chapters have addressed all of the most important SAX2 classes and interfaces. You may need to use a handful of other classes, including simple implementations of a few more interfaces and SAX1 support. This chapter briefly presents those remaining classes and interfaces.

Your parser distribution should have SAX2 support, with complete javadoc for these classes. Consult that documentation if you need more information than found in this book. The API summary in Appendix A, "SAX2 API Summary" should also be helpful.

5.1. Helper Classes

There are several classes in the org.xml.sax.helpers package that you will probably find useful from time to time.

5.1.1. The AttributesImpl Class

This is a general-purpose implementation of the SAX2 Attributes interface. As well as reading attribute information (as defined in the interface), you can write and modify it. This class is quite handy when your application code is producing SAX2 events, perhaps because it is converting data structures to a SAX event stream.

Remember the attributes provided to the ContentHandler.startElement() event callback are only valid for the duration of that call. If you need a copy of those attributes for later use, it's simplest to use this class; just create a new instance using the copy constructor. That copy constructor is one of the most widely used APIs in this class, other than the Attributes methods.

It's often handy to keep a stack around to track the currently open elements and attributes. If you support xml:base, you'll also want to track base URIs for the document and for any external parsed entities. This is easy to implement using another key method provided by this class, addAttribute(). Example 5-1 shows how to maintain such a stack with xml:base support. It shows full support for XML namespaces, unlike Example 2-2, which is simple and attribute-free (shown in Chapter 2, "Introducing SAX2" in Section 2.3, "Basic ContentHandler Events").

Example 5-1. Maintaining an element and attribute stack

import java.io.IOException;
import java.net.URL;
import java.util.Hashtable;
import org.xml.sax.*;
import org.xml.sax.ext.*;
import org.xml.sax.helpers.AttributesImpl;
import org.xml.sax.helpers.DefaultHandler;

public class XStack extends DefaultHandler
    implements LexicalHandler, DeclHandler
{
    static class StackEntry
    {
	final String	 nsURI, localName;
	final String	 qName;
	final Attributes atts;
	final StackEntry parent;

	StackEntry (
	    String namespace, String local,
	    String name,
	    Attributes	attrs,
	    StackEntry	next
	) {
	    this.nsURI = namespace;
	    this.localName = local;
	    this.qName = name;
	    this.atts = new AttributesImpl (attrs);
	    this.parent = next;
	}
    }

    private Locator		locator;
    private StackEntry		current;
    private Hashtable		extEntities = new Hashtable ();

    private static final String	xmlNamespace
	= "http://www.w3.org/XML/1998/namespace";

    private void addMarker (String label, String uri)
    throws SAXException
    {
	AttributesImpl	atts = new AttributesImpl ();

	if (locator != null && locator.getSystemId () != null)
	    uri = locator.getSystemId ();

	// guard against InputSource objects without system IDs
	if (uri == null)
	    throw new SAXParseException ("Entity URI is unknown", locator);

	// guard against illegal relative URIs (Xerces)
	try { new URL (uri); }
	catch (IOException e) {
	    throw new SAXParseException ("parser bug: relative URI", 
                     locator);
	}

	atts.addAttribute (xmlNamespace, "base", "xml:base", "CDATA", uri);
	current = new StackEntry ("", "", label, atts, current);
    }

    // walk up stack to get values for xml:space, xml:lang, and so on
    public String getInheritedAttribute (String uri, String name)
    {
	String		retval = null;
	boolean		useNS = (uri != null && uri.length () != 0);

	for (StackEntry here = current;
		retval == null && here != null;
		here = here.parent) {
	    if (useNS)
		retval = here.atts.getValue (uri, name);
	    else
		retval = here.atts.getValue (name);
	}
	return retval;
    }

    // knows about XML Base recommendation, and xml:base attributes
    // can be used in callbacks for elements, PIs, comments,
    // characters, ignorable whitespace, and so on.
    public URL getBaseURI ()
    throws IOException
    {
	return getBaseURI (current);
    }

    private URL getBaseURI (StackEntry here)
    throws IOException
    {
	String		uri = null;

	while (uri == null && here != null) {
	    uri = here.atts.getValue (xmlNamespace, "base");
	    if (uri != null)
		break;
	    here = here.parent;
	}

	// marker for document or entity boundary?  absolute.
	if (here.qName.charAt (0) == '#')
	    return new URL (uri);

	// else it might be a relative uri.
	int		offset = uri.indexOf (":/");

	if (offset == -1 || uri.indexOf (':') < offset)
	    return new URL (getBaseURI (here.parent), uri);
	else
	    return new URL (uri);
    }

    // from ContentHandler interface
    public void startElement (
	String		namespace,
	String		local,
	String		name,
	Attributes	attrs
    ) throws SAXException
	{ current = new StackEntry (namespace, local, name, attrs, 
                   current); }

    public void endElement (String namespace, String local, String name)
    throws SAXException
	{ current = current.parent; }

    public void setDocumentLocator (Locator l)
	{ locator = l; }

    public void startDocument ()
    throws SAXException
	{ addMarker ("#DOCUMENT", null); }

    public void endDocument ()
	{ current = null; }

    // DeclHandler interface

    public void externalEntityDecl (String name, String publicId, 
           String systemId)
    throws SAXException
    {
	if (name.charAt (0) == '%')
	    return;
	// absolutize URL
	try {
	    URL	url = new URL (locator.getSystemId ());
	    systemId = new URL (url, systemId).toString ();
	} catch (IOException e) {
	    // what could we do?
	}
	extEntities.put (name, systemId);
    }

    public void elementDecl (String name, String model) { }
    public void attributeDecl (String element, String name,
    		String type, String mode, String defaultValue) {}
    public void internalEntityDecl (String name, String value) { }

    // LexicalHandler interface
    public void startEntity (String name)
    throws SAXException
    {
	String	uri = (String) extEntities.get (name);
	if (uri != null)
	    addMarker ("#ENTITY", uri);
    }

    public void endEntity (String name)
    throws SAXException
	{ current = current.parent; }
    
    public void startDTD (String root, String publicId, String systemId) {}
    public void endDTD () {}
    public void startCDATA () {}
    public void endCDATA () {}
    public void comment (char buf[], int off, int len) {}
}

With such a stack of attributes, it's easy to find the current values of inherited attributes like xml:space, xml:lang, xml:base, and their application-specific friends. For example, an application might have a policy that all unspecified attributes with #IMPLIED default values are inherited from some ancestor element's value or are calculated using data found in such a context stack.

Notice how this code added marker entries on the stack with synthetic xml:base attributes holding the true base URIs for the the document and external general entities. That information is needed to correctly implement the recommendation, and lets the getBaseURI() work entirely from this stack. If you need such functionality very often, you might want to provide a more general API, not packaged as internal to one handler implementation.

5.1.2. The LocatorImpl Class

This is a general-purpose implementation of the Locator interface. As well as reading location properties (as defined in the interface), you can write and modify them. It's part of SAX1 and is still useful in SAX2.

The locator provided by the ContentHandler.setDocumentLocator() can be used during any event callback, but the values it returns will change over time. If you need a copy of those values for later use, it's simplest to use this class; just create a new instance using the copy constructor. More typically, you will pass the locator to the constructor for some kind of SAXException, or just save the current base URI to use with relative URIs you find in document (or attribute) content.

5.1.3. The NamespaceSupport Class

When your code needs to track namespaces or their prefixes, use this SAX2 class. One audience for this class is authors of XML parsers; that's probably not you. More likely you're writing code that, like XPath or W3C's XML schemas, needs to parse prefixed names when they're found in attribute values or element content; this class can help. Or you may be writing code to select or generate element or attribute name prefixes for output. (If you only need to put those names in element or attribute names, you should be able to package that work in an event filter component that postprocesses your output and ensures that its namespace content matches XML 1.0 rules.)

What this class does is maintain a stack of namespace contexts, in which each context holds a set of prefix-to-URI mappings; the contexts normally correspond to an element. This is the right model to use when you're writing an XML parser. If you try to use this class in a layer on top of a SAX2 parser, you'll notice a slight mismatch: all the prefix-mapping events for an element's namespace context precede the startElement() events for that element. That is, you'll need to create and populate new contexts before you see the element that signifies a new context.[23] One simple way to work around this is with a Boolean flag indicating whether a new context is active yet.

[23]This is true unless xmlns* attributes get reported with startElement(), and you only use that form of the prefix-mapping events.

To use this class with a SAX2 parser that's set to report namespace prefix mappings, you have to modify some of your ContentHandler callbacks to maintain that stack of contexts. This is done in much the same way as you produce those callbacks yourself:

Instantiate a NamespaceSupport object using its default constructor (the only one). A good time to do this is when you start your event stream, at the ContentHandler.startDocument() event callback. When you do this, set a Boolean contextActive flag to false, so that you'll create a new context for the root element.
When you get (or make) a ContentHandler.startPrefixMapping(prefix,uri) event, see if contextActive is true. If not, call pushContext() and set that flag to true. Then call declarePrefix(prefix,uri). (It returns false if you give it illegal inputs.)
At the end of any ContentHandler.startElement() event, see if contextActive is true. If not, call pushContext(). Then set that flag to false, forcing any child elements' namespace declarations to create a new context.
Finally, at the end of any ContentHandler.endElement() event, call popContext().
Call reset() to forcibly reset all state before you reuse the class. Doing this at the end of the ContentHandler.endDocument() callback should work.

If you follow these rules, you can use processName() to interpret element and attribute names that you find according to the current prefix bindings, or you can use getPrefix() to choose a prefix given a particular namespace URI:

String [] processName(qName,parts,isAttribute)

Use this method to find the namespace name corresponding to a qualified element or attribute name (perhaps as found inside an attribute value or element content). Parameters are:

String qName: This is the qualified name, such as units:currency or fare, that is being examined.
String parts[3]: This is a three-element array. If this method succeeds in processing the name, the first array element will hold the namespace URI, the second will hold the local (unprefixed) name, and the third will hold the qName you passed in. The first and second string may also be empty strings, if the qName has no prefix and if no default namespace URI is applicable.
String isAttribute: Pass this value as true if the qName parameter identifies an attribute; otherwise, pass this as false. This information is needed because unprefixed element names are interpreted using any default namespace URI, but attribute names are not.

If this method succeeds, the parts parameter is filled out and returned. Otherwise the name includes a reference to an undeclared prefix, and null will be returned.

String getPrefix(String uri)

Use this method to choose a prefix to use when constructing a qualified name. This returns a currently defined prefix associated with the specified namespace URI or null if no such prefix is defined. When no such prefix is defined, the default namespace URI (associated with element names that have no prefixes) might still be appropriate. If so, then getURI("") will return this URI.

Consult the class documentation (javadoc) for full details about the methods on this class.

Chapter 5. Other SAX Classes

Contents:

5.1. Helper Classes

5.1.1. The AttributesImpl Class

Example 5-1. Maintaining an element and attribute stack

5.1.2. The LocatorImpl Class

5.1.3. The NamespaceSupport Class