2.6. Namespaces and SAX2However you use XML namespaces with SAX, you need to understand the core concepts discussed in this section. Namespaces can be confusing; they're more complex than perhaps they ought to be. In part this is because of how they interact (or don't interact) with other parts of Greater XML; in part it's because everyone has different ways to a determine what words mean, and XML names are kinds of words. We'll look at some of those complexities first, and then at the mechanisms SAX2 has to help you deal with them. But first, just what are namespaces supposed to do? Usually, they identify some particular technical vocabulary. People often reuse words rather than create new ones, and they acquire context-specific meanings and nuances that can be extremely important. A namespace can distinguish whether a word like "bill" refers to part of a bird, a now-archaic weapon, part of a hat, legislative acts, or a number of other things. So a <bill length='45cm'/> element might be associated with a namespace, which provides context that should help applications interpret the element. A processor for "Birder's Markup Language" could know to reject (or ignore) markup intended for legislative or financial uses, even if they all use "bill" elements. XML defines a way to declare namespaces as needed, using attributes. Namespaces are usually indicated by a prefix, which can serve as a qualifying adjective: "the bird's bill" might be bird:bill while "the consultant's bill" might be consultant:bill. You can also set up a default element namespace so that an unadorned bill element might indicate, for example, a weapon. 2.6.1. What Namespaces Do to XMLXML namespaces are a convention for using attributes to associate URIs with some element and attribute names. Since not all legal XML documents follow this convention, the XML Namespaces specification effectively specifies a dialect of XML. SAX2 supports both dialects: strict XML and XML plus namespaces. By default, SAX2 parsers expect the namespaces dialect. In most cases you'll be able to ignore the difference between those two XML dialects, since documents that use XML in namespace-incompatible ways aren't common. Even apart from the two-dialects issue, the use of namespaces with XML complicates XML programming. There are two models for using element and attribute names in XML:
If you're working with or designing XML structures with context-dependent names, then namespaces add new kinds of context and hence new ways to cause confusion. SAX2 gives you the tools to track all the context, but you'll have to record it yourself (probably with some kind of stack) since startElement() parameters will no longer give all the context you need. There are also some conflicts between the element-naming approach of the XML Namespaces specifications and DTD validity as defined in the XML specification. They may not affect your SAX2 programs, but can affect the systems you're implementing with XML and SAX2. The issue is basically that DTDs expect everything to be declared once up front (like import statements in Java), while the namespace mechanism provides a lexical scoping mechanism (like declaring variables that live on the execution stack) that's flexible about what a given prefix indicates. You can make namespace-correct documents that are DTD-valid, but then you can't change the prefixes bound to namespaces.[11] Namespace-aware DTDs will often define default element namespaces for element names.
If you are designing a namespace and want to use the URI to publish information describing the namespace, rather than just use it as a unique identifier, then RDDL (http://www.rddl.org) is probably a good resource. RDDL defines an XHTML-based document syntax that can be viewed or mechanically processed. It lets you find some of the resources that might be important when working with the namespaces -- for example, different stylesheets and schemas and documentation in various languages. The RDDL web site includes SAX support for accessing this data. 2.6.2. Element and Attribute Naming with NamespacesThe direct impact of XML namespaces on your SAX2 application code is to give you a second way to identify elements and attributes. Documents will normally use only one identification style for a given element or attribute. These identification styles are distinct from the two models for using such names, described earlier:
Note that the XML Namespaces specification only standardizes the "qualified name" (qName) terminology; it doesn't standardize terminology for universal names. Because of this, you will also see other terms, such as "expanded names" (the term used by XPath) or "namespace-style names" (used to talk about that style of naming). Since ContentHandler.startElement() callbacks now have to deal with three different kinds of name strings, the code can get rather complicated. Plus, even if you're expecting only universal names, you'll need to notice when elements or attributes don't have universal names and use qualified names to work with them. Element names are identified in method parameters (the same as in ContentHandler.endElement()), while attribute names show up in accessor methods for Attributes objects. We'll use the following XML text to illustrate these different types of names: <big:animals xmlns="http://www.example.com/dog"> xmlns:big="http://www.example.com/big"> <wolfhound cat='no' big:dog='yes' /> <greyhound big:dog='yes' xmlns=""/> </big:animals> SAX2 calls names in XML text "Qualified Names." These are the same thing as "XML 1.0 names" except that XML 1.0 names have no restrictions on the use of colons. When you disable namespace processing in a SAX2 parser, it will deliver "qualified names" that are really XML 1.0 names, without those restrictions. With namespace processing enabled, many qualified names (including every name with a prefix) will correspond to a namespace-style name. Element names without a prefix might not have a corresponding universal name. Unprefixed attribute names will never have a universal name. In those cases, applications must use the qualified name along with non-namespace context, such as the enclosing element, to figure out what the name is supposed to mean. There are no universally accepted policies for such cases. Yes, all that confuses other people as well. 2.6.2.1. Element namingThe identifiers for the element names are the first three parameters of void startElement(String namespaceURI, String localName, String qName, Attributes atts). Table 2-1 shows the values of the element names for the previous example, as reported by a SAX2 parser in its default mode. Notice particularly that the namespace URI is empty except when a namespace declaration applies to that element name, and that if there's a nonempty namespace URI, there might not be a value for qName. That's not just for element names using namespace prefixes; for element names, a default element namespace declaration will apply if it's within scope. (Remember that empty strings aren't the same as nulls.) Table 2-1. ContentHandler.startElement() parameters for element names
You could end up with lots of code like this in your SAX event handlers. Or, you may prefer to factor it as a table lookup (maybe using application-specific types of handler objects) rather than as a tree of comparisons. Notice that for elements without a namespace URI, the qName is checked, but if there's a namespace URI, then localName is used. Also all unrecognized elements are reported as a kind of validity error. You may well need to have more context-dependent logic too, if elements may only show up in appropriate contexts. Such contexts often need different decision trees. See Example 2-8 for a decision tree for startElement(). Example 2-8. Decision tree in startElement( )public void startElement (String uri, String localName, String qName, Attributes atts) throws SAXException { // elements outside of any namespace? if ("".equals (uri)) { if ("greyhound".equals (qName)) { ... handle return; } ... else handle N other elements; return on success // no recognized element: a validity error errorHandler.error (new SAXParseException ( "Unrecognized element: " + qName, locator )); // if that doesn't abort the parse: return; // in the "big" namespace? } else if ("http://www.example.com/big".equals (uri)) { if ("animals".equals (localName)) { ... handle return; } ... handle "islands" and N other big things; return on success // FALLTHROUGH for unrecognized elements // in the "dog" namespace? } else if ("http://www.example.com/dog".equals (uri)) { if ("wolfhound".equals (localName)) { ... handle return; } ... handle "terrier", "collie" and so on; return on success // FALLTHROUGH for unrecognized elements } ... and so on for other namespaces // element not in a namespace we recognize: a validity error errorHandler.error (new SAXParseException ( "Unrecognized element: " + uri + " (" + localName + ")", locator )); // returns if that doesn't abort the parse } Most SAX2 parsers provide qualified names in all cases, but you shouldn't rely on their availablity unless the parser is configured to provide namespace prefix information (which also causes namespace-declaration attributes to be "un-hidden"). You should probably avoid using the qName, even for diagnostics, when there's a nonempty namespaceURI. 2.6.2.2. Attribute namingThe identifiers for the attribute names are accessed using Attributes methods such as getQName(), getLocalName(), and getURI() when you iterate over an element's attributes with a "for" loop. You can access attribute values directly if you use either XML 1.0-style names (qName) or XML Namespace-style names (namespaceURI and localName). SAX2 parsers handle attribute names from the example text as shown in Table 2-2. This table shows the "mixed mode" behavior, described later; in the default SAX2 parser mode, the xmlns and xmlns:big attributes won't appear. You'd have to set the namespace-prefixes feature flag (as described later in this chapter, in Section 2.6.3, "Namespace Feature Flags") to see these attributes. Note that according to the namespaces specification there is no such thing as a default namespace for attribute names, so that namespace declaration attributes don't go into any namespace. Table 2-2. Attributes methods to access attribute names
So if you wanted to write some code that ignored elements without a big:dog attribute (that is, the URI is http://www.example.com/big/ and the local name is dog) with value "yes", it might look like this: public void startElement (String uri, String local, String qName, Attributes atts) throws SAXException { String value; value = atts.getValue ("http://www.example.com/big", "dog"); if (!"yes".equals (value)) { // arrange to ignore text and elements until this finishes return; } ... process the element } 2.6.2.3. Things to keep in mindTo avoid confusing things, the previous code didn't illustrate two somewhat perverse cases. First, if the big prefix were redefined for some element, the same qualified name could correspond to a different universal name, with the same local name but different namespace URIs. That's one reason the previous code doesn't check for a qName of big:dog. Using a qName of big:dog might make sense if you were working with XML 1.0 without using XML namespaces. Second, if the URI used with the big prefix were associated with a second prefix, different qualified names could correspond to the same universal names. That's another reason the previous code doesn't check for a qName of big:dog. If you are writing namespace-aware code, use only namespace-style name testing in your code to avoid such problems. That makes your code work correctly even when it deals with documents that use namespace declarations in ways you didn't expect. By default, SAX2 XML parsers provide universal names for elements and attributes that have namespaces (they'll have nonempty localName and namespaceURI strings) or qualified names for elements and attributes that don't, and will remove the namespace declaration attributes from the Attributes object provided in the ContentHandler.startElement() event. Unless a default element namespace declaration is in scope, an element whose XML 1.0-style name has no prefix won't have a namespace-style identifier. Attributes with unprefixed names work differently, since default element namespace declarations never apply to attribute names. If you work with both SAX2 and DOM Level 2, you need to be aware of the differences in how these APIs expose namespaces. The terminology is similar but not identical; SAX2 talks about "URI" while DOM Level 2 talks about "NamespaceURI," and SAX2 uses "QName" not "Name"; but both APIs talk about the "LocalName." When using element or attribute construction methods in the org.w3c.dom.Document class, you will notice that DOM uses two different APIs in places in which SAX2 provides just one callback (in three different modes, as discussed in the next section). You are most likely to trip over different ways to tell whether an element or attribute has no namespace URI: SAX2 uses an empty string (length zero), while DOM Level 2 uses a null string. You may also notice that while SAX2 follows the XML Namespaces specification with regards to the attributes that define namespaces, DOM does not. In SAX2, those attributes have no URIs, but DOM assigns http://www.w3.org/2000/xmlns/ as their namespace URI. 2.6.3. Namespace Feature FlagsSAX2 controls its namespace-processing support through two feature flags, which can be tested and changed using the setFeature() and getFeature() methods described earlier in this chapter in Section 2.4.1, "SAX2 Feature Flags". The two flags are http://xml.org/sax/features/namespaces (namespaces), which controls whether parsers handle namespace declarations, and http://xml.org/sax/features/namespace-prefixes (namespace-prefixes), which controls whether applications can see the underlying XML syntax. All SAX2 parsers support both flags, although their values might be read-only. Given two flags, there are four possible combinations. Only three are legal. It's easiest to understand what the flags do by considering them as each controlling a small processing task layered over a core that just parses XML text. The SAX2 defaults are set so both tasks are performed.
The fourth combination of flags, disabling both namespace support and namespace prefix reporting, would be meaningless, and so it is an illegal parser state. Don't set this mode; parsers might not detect that you've put them into an illegal mode and may react unintelligently (such as by entering "XML 1.0 mode"). Unfortunately it's easy to set this mode if you just set the namespaces flag to false without first setting the namespaces-prefix flag to true (entering mixed mode). I tend to prefer the mixed mode over the SAX2 default mode. Enabling it is simple: just set the namespaces-prefix flag to true, after setting up a parser for the SAX2 defaults. This mode provides better support for the XML Infoset, since it doesn't discard information about the prefixes. You won't see implementation-dependent behaviors in exposing either type of name. Certain kinds of XML processing will work better. In particular, algorithms working near the XML syntax level -- such as writing out XML text or performing consumer-side DTD validation -- will then work without needing to guard against discarded prefixes and without re-creating namespace declaration attributes. Discarding or changing prefixes, in particular, can cause confusion when people need to look at the XML output. The only real impact on applications is having to ignore xmlns and xmlns:* attributes, which isn't hard. Few, if any, applications really need to work with documents that use colons in ways other than the XML namespaces specification, leaving a small performance impact as the primary reason to care about the pure XML 1.0 mode. Even applications that don't use namespaces usually won't see colons used in interesting ways (like nested:contexts:for:names). While most SAX2 XML parsers support all three of these modes, they are only required to support the SAX2 default mode. 2.6.4. ContentHandler and Prefix MappingsSometimes XML needs to handle "meta-level" processing, in which XML talks about XML. In such processing, namespace URIs are sometimes implicitly called by prefixes found in places no XML parser will look: CDATA attributes (which can contain anything) and character content found within elements. For example, XPath expressions include prefixes, and they are found in XSLT template attributes. The W3C XML Schema Datatypes (XSD) defines a QName datatype that formalizes such usage. When you need to work with those types of XML text, you'll find two particular ContentHandler event callbacks helpful. They provide the same information found in xmlns and xmlns:* attributes, relieving your application code of the responsibility of correctly applying the XML Namespaces specification. For example, your code won't need to know how a default element namespace declaration can be explicitly undone by xmlns="" attributes or by ending the lexical scope of that attribute.
You'd normally ignore these two calls, unless you use them to maintain some data structure that tracks active namespace prefixes. It would have to be a stacklike data structure, since one mapping for a prefix only temporarily hides a previous mapping for the same prefix. This is the notion of lexical scope, which you are familiar with from most programming languages. SAX2 includes a helper class to handle this for you: NamespaceSupport, discussed in Section 5.1.3, "The NamespaceSupport Class " in Chapter 5, "Other SAX Classes". Then when you parse the meta-level content, you can use those data structures to interpret prefix references and handle other namespace-related work. Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|||||||||||||||||||||||||||
|