2.4. Producer-Side ValidationAll uses of SAX2 parsers will involve extending and customizing the basic scenario we saw earlier. Our next example illustrates two basic configuration mechanisms: error handling options, which lets you use the appropriate policy when you see errors, and parser configuration through feature flags, which let you control some details of how the parser works. (Some event handlers are managed with a configuration mechanism that is quite similar to the feature flag mechanism.) The example also shows how SAX2 parsers expose the core XML notion of DTD-based validation.
You will often tell XML parsers to validate XML as they produce events. Because SAX2 provides access to most of the data in XML documents, including declarations from DTDs, it also supports performing such validation on the event consumer side, possibly with a cached DTD or schema. (The consumer side is the only place to perform procedural validation.) Such consumer-side validation can be important when you're trying to make your program output meet the constraints of a particular information interchange agreement; just add a streaming validation stage to your output processing. This approach can also be used for DOM revalidation and similar purposes. Here, we look at how to validate data that is already in the form of XML text. Keep in mind that some important DTD-related processing does not involve validation. Documents with DTDs can use entity substitution for document modularity and text portability, and can have attributes defaulted and normalized. Validation with DTDs only involves checking a set of rules. Disabling DTD validation turns off only the rule checks, not the processing for entities and attributes. 2.4.1. SAX2 Feature FlagsSAX2 exposes many parser behaviors, including DTD validation, using a "feature flag" mechanism. These flags are Boolean settings, which may have values or be unspecified. Parsers can have up to four different modes for any feature flag. For example, with the validation flag SAX2 implies four kinds of XML parsers:
Later in this chapter, look at the feature flags used to characterize namespace processing. Those flags are not optional, so fewer potential parser modes are possible. All the standardized feature flags are detailed in Section 3.3.2, "XMLReader Feature Flags" in Chapter 3, "Producing SAX2 Events". In SAX, URIs identify feature flags. These are used purely as unique identifiers. This is the same approach used in XML namespaces: don't use these URIs to retrieve data, even if they do look like URLs you could type into a browser. The URI http://xml.org/sax/features/validation identifies the flag-controlling validation.
To check how a given XML parser handles validation, use code similar to Example 2-5. Code for any other kind of parser feature will look much the same, as long as you use the correct ID for the feature flag; you'll see the same exception types working in the same way. (The same is true for parser "properties," which you'll see in Section 3.3.1, "XMLReader Properties" in Chapter 3, "Producing SAX2 Events".) Example 2-5. Checking for validation supportXMLReader producer; String uri = "http://xml.org/sax/features/validation"; // ... get the parser // Try getting and setting the flag try { System.out.println ("Initial validation setting: " + producer.getFeature (uri)); // if we get here, validation behavior is known producer.setFeature (uri, true); // if we get here, the parser either validates by // default or is optionally validating } catch (SAXNotSupportedException e) { // value not supported; parser is nonvalidating System.out.println ("Can't enable validation: " + e.getMessage ()); System.exit (1); } catch (SAXNotRecognizedException e) { // feature not understood; parser has weak SAX2 support. // maybe it's a SAX1 parser inside a ParserAdapter System.out.println ("Doesn't understand validation: " + e.getMessage ()); System.exit (1); } As a rule, programs will probably set the validation flag to true only when they really need reports of validity errors. (Why? As we'll see in a moment, it's natural to ignore reports of validity errors when they're not important, so it doesn't much matter if you validate when you don't need to.) The skeleton program in Example 2-1 really just needs a setFeature() call and a small update to the diagnostic message, to be sure it's always validating. (The diagnostics could be more precise using some more-specialized exceptions that we haven't discussed yet.) // Get an instance of the default XML parser class try { producer = XMLReaderFactory.createXMLReader (); producer.setFeature ( "http://xml.org/sax/features/validation", true); } catch (SAXException e) { System.err.println ( "Can't get validating parser, check configuration: " + e.getMessage ()); return; } The validation feature flag is probably the most widely used, with the possible exception of the flags controlling namespace handling. Most parsers leave validation off by default to save some minor parsing overhead. 2.4.2. Handling Validity ErrorsIf you modify the skeleton program to set the parser's validation flag and then run it on a well-formed but invalid document (perhaps one without a DTD), you will probably be surprised to discover that it doesn't seem to report any errors. That's exactly what should happen since it's the default behavior specified by SAX. To make validity errors cause anything interesting to happen, you have to change how they're handled. If you don't change this handling, you won't be able tell a validating parser apart from a nonvalidating one! The simplest way to change the handling of validity errors is to make them work just like well-formedness errors: by aborting the parse. This uses the ErrorHandler interface that we look at later in this chapter, in Section 2.5.2, "ErrorHandler Interface", but for now it's simpler to focus on one method. In terms of the skeleton program shown earlier, such a change can be an update to just one line, using an anonymous inner class to make the code look simple. (Of course, avoid using anonymous classes for anything complex; they can make code hard to maintain.) // Get a consumer for all the parser events consumer = new DefaultHandler () { public void error (SAXParseException e) throws SAXException { throw e; } }; XML parsers call ErrorHandler.error() whenever they find a validity error, or when they see certain other nonfatal errors. In this case, our custom handler adopts a policy that whenever it sees such an error, it will abort the parse by throwing the exception reported to it. Later in this chapter we look at some alternative policies. When your callback detects serious application-level errors, you can throw a SAXException from any SAX event handler callback to abort parsing. That doesn't have be done only from an ErrorHandler. For example, when input data is valid XML but doesn't meet essential semantic requirements of the application, report it using some kind of SAXException. If your code only knows how to process shipping invoices, then greeting cards should be rejected immediately. Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|