Namespaces and SAX2 (SAX2)

2.6. Namespaces and SAX2

But first, just what are namespaces supposed to do? Usually, they identify some particular technical vocabulary. People often reuse words rather than create new ones, and they acquire context-specific meanings and nuances that can be extremely important. A namespace can distinguish whether a word like "bill" refers to part of a bird, a now-archaic weapon, part of a hat, legislative acts, or a number of other things. So a <bill length='45cm'/> element might be associated with a namespace, which provides context that should help applications interpret the element. A processor for "Birder's Markup Language" could know to reject (or ignore) markup intended for legislative or financial uses, even if they all use "bill" elements.

XML defines a way to declare namespaces as needed, using attributes. Namespaces are usually indicated by a prefix, which can serve as a qualifying adjective: "the bird's bill" might be bird:bill while "the consultant's bill" might be consultant:bill. You can also set up a default element namespace so that an unadorned bill element might indicate, for example, a weapon.

2.6.2. Element and Attribute Naming with Namespaces

The direct impact of XML namespaces on your SAX2 application code is to give you a second way to identify elements and attributes. Documents will normally use only one identification style for a given element or attribute. These identification styles are distinct from the two models for using such names, described earlier:

Qualified names: These are exactly as found in the XML text. Examples include para and, with a prefix, xhtml:p. (XML documents that don't use namespaces, and some namespace-style documents won't use colons.)
Universal names: These consist of two separate strings: a "local name" from the XML text (removing any namespace prefix) and a "namespace name" (always a URI) from namespace declarations. For the qualified name xhtml:p, the local name is p, and the namespace name is the URI associated with the prefix xhtml, which is a function in the namespace declaration. Such names are in a sense "universalized" by addition of a suitable URI.

Note that the XML Namespaces specification only standardizes the "qualified name" (qName) terminology; it doesn't standardize terminology for universal names. Because of this, you will also see other terms, such as "expanded names" (the term used by XPath) or "namespace-style names" (used to talk about that style of naming).

Since ContentHandler.startElement() callbacks now have to deal with three different kinds of name strings, the code can get rather complicated. Plus, even if you're expecting only universal names, you'll need to notice when elements or attributes don't have universal names and use qualified names to work with them. Element names are identified in method parameters (the same as in ContentHandler.endElement()), while attribute names show up in accessor methods for Attributes objects. We'll use the following XML text to illustrate these different types of names:

<big:animals  xmlns="http://www.example.com/dog">
	      xmlns:big="http://www.example.com/big">
    <wolfhound cat='no' big:dog='yes' />
    <greyhound big:dog='yes' xmlns=""/>
</big:animals>

SAX2 calls names in XML text "Qualified Names." These are the same thing as "XML 1.0 names" except that XML 1.0 names have no restrictions on the use of colons. When you disable namespace processing in a SAX2 parser, it will deliver "qualified names" that are really XML 1.0 names, without those restrictions. With namespace processing enabled, many qualified names (including every name with a prefix) will correspond to a namespace-style name.

Element names without a prefix might not have a corresponding universal name. Unprefixed attribute names will never have a universal name. In those cases, applications must use the qualified name along with non-namespace context, such as the enclosing element, to figure out what the name is supposed to mean. There are no universally accepted policies for such cases. Yes, all that confuses other people as well.

2.6.2.1. Element naming

The identifiers for the element names are the first three parameters of void startElement(String namespaceURI, String localName, String qName, Attributes atts). Table 2-1 shows the values of the element names for the previous example, as reported by a SAX2 parser in its default mode. Notice particularly that the namespace URI is empty except when a namespace declaration applies to that element name, and that if there's a nonempty namespace URI, there might not be a value for qName. That's not just for element names using namespace prefixes; for element names, a default element namespace declaration will apply if it's within scope. (Remember that empty strings aren't the same as nulls.)

Table 2-1. ContentHandler.startElement() parameters for element names

namespaceURI	localName	qName
http://www.example.com/big	animals	empty or `big:animals`
http://www.example.com/dog	wolfhound	empty or `wolfhound`
empty	empty	`greyhound`

You could end up with lots of code like this in your SAX event handlers. Or, you may prefer to factor it as a table lookup (maybe using application-specific types of handler objects) rather than as a tree of comparisons. Notice that for elements without a namespace URI, the qName is checked, but if there's a namespace URI, then localName is used. Also all unrecognized elements are reported as a kind of validity error. You may well need to have more context-dependent logic too, if elements may only show up in appropriate contexts. Such contexts often need different decision trees. See Example 2-8 for a decision tree for startElement().

Example 2-8. Decision tree in startElement( )

public void
startElement (String uri, String localName, String qName, Attributes atts)
throws SAXException
{
    // elements outside of any namespace?
    if ("".equals (uri)) {
	if ("greyhound".equals (qName)) {
	    ... handle
	    return;
	}
	... else handle N other elements; return on success

	// no recognized element: a validity error
	errorHandler.error (new SAXParseException (
		"Unrecognized element: " + qName,
		locator
		));
	// if that doesn't abort the parse:
	return;

    // in the "big" namespace?
    } else if ("http://www.example.com/big".equals (uri)) {
	if ("animals".equals (localName)) {
	    ... handle
	    return;
	}
	... handle "islands" and N other big things; return on success
	// FALLTHROUGH for unrecognized elements

    // in the "dog" namespace?
    } else if ("http://www.example.com/dog".equals (uri)) {
	if ("wolfhound".equals (localName)) {
	    ... handle
	    return;
	}
	... handle "terrier", "collie" and so on; return on success
	// FALLTHROUGH for unrecognized elements
    }

    ... and so on for other namespaces

    // element not in a namespace we recognize: a validity error
    errorHandler.error (new SAXParseException (
	    "Unrecognized element: " + uri + " (" + localName + ")",
	    locator
	    ));
    // returns if that doesn't abort the parse
}

Most SAX2 parsers provide qualified names in all cases, but you shouldn't rely on their availablity unless the parser is configured to provide namespace prefix information (which also causes namespace-declaration attributes to be "un-hidden"). You should probably avoid using the qName, even for diagnostics, when there's a nonempty namespaceURI.

2.6.2.2. Attribute naming

The identifiers for the attribute names are accessed using Attributes methods such as getQName(), getLocalName(), and getURI() when you iterate over an element's attributes with a "for" loop. You can access attribute values directly if you use either XML 1.0-style names (qName) or XML Namespace-style names (namespaceURI and localName).

SAX2 parsers handle attribute names from the example text as shown in Table 2-2. This table shows the "mixed mode" behavior, described later; in the default SAX2 parser mode, the xmlns and xmlns:big attributes won't appear. You'd have to set the namespace-prefixes feature flag (as described later in this chapter, in Section 2.6.3, "Namespace Feature Flags") to see these attributes. Note that according to the namespaces specification there is no such thing as a default namespace for attribute names, so that namespace declaration attributes don't go into any namespace.

Table 2-2. Attributes methods to access attribute names

`getURI()`	`getLocalName()`	`getQName()`
empty	empty	`xmlns`
empty	empty	`xmlns:big`
empty	empty	`cat`
http://www.example.com/big	dog	empty or `big:dog`

So if you wanted to write some code that ignored elements without a big:dog attribute (that is, the URI is http://www.example.com/big/ and the local name is dog) with value "yes", it might look like this:

public void startElement (String uri, String local, String qName, 
	Attributes atts)
throws SAXException
{
    String    value;

    value = atts.getValue ("http://www.example.com/big", "dog");
    if (!"yes".equals (value)) {
	// arrange to ignore text and elements until this finishes
	return;
    }
    
    ... process the element
}

2.6.2.3. Things to keep in mind

To avoid confusing things, the previous code didn't illustrate two somewhat perverse cases. First, if the big prefix were redefined for some element, the same qualified name could correspond to a different universal name, with the same local name but different namespace URIs. That's one reason the previous code doesn't check for a qName of big:dog. Using a qName of big:dog might make sense if you were working with XML 1.0 without using XML namespaces. Second, if the URI used with the big prefix were associated with a second prefix, different qualified names could correspond to the same universal names. That's another reason the previous code doesn't check for a qName of big:dog. If you are writing namespace-aware code, use only namespace-style name testing in your code to avoid such problems. That makes your code work correctly even when it deals with documents that use namespace declarations in ways you didn't expect.

By default, SAX2 XML parsers provide universal names for elements and attributes that have namespaces (they'll have nonempty localName and namespaceURI strings) or qualified names for elements and attributes that don't, and will remove the namespace declaration attributes from the Attributes object provided in the ContentHandler.startElement() event. Unless a default element namespace declaration is in scope, an element whose XML 1.0-style name has no prefix won't have a namespace-style identifier. Attributes with unprefixed names work differently, since default element namespace declarations never apply to attribute names.

If you work with both SAX2 and DOM Level 2, you need to be aware of the differences in how these APIs expose namespaces. The terminology is similar but not identical; SAX2 talks about "URI" while DOM Level 2 talks about "NamespaceURI," and SAX2 uses "QName" not "Name"; but both APIs talk about the "LocalName." When using element or attribute construction methods in the org.w3c.dom.Document class, you will notice that DOM uses two different APIs in places in which SAX2 provides just one callback (in three different modes, as discussed in the next section). You are most likely to trip over different ways to tell whether an element or attribute has no namespace URI: SAX2 uses an empty string (length zero), while DOM Level 2 uses a null string. You may also notice that while SAX2 follows the XML Namespaces specification with regards to the attributes that define namespaces, DOM does not. In SAX2, those attributes have no URIs, but DOM assigns http://www.w3.org/2000/xmlns/ as their namespace URI.