Producer-Side Validation (SAX2)

2.4.1. SAX2 Feature Flags

SAX2 exposes many parser behaviors, including DTD validation, using a "feature flag" mechanism. These flags are Boolean settings, which may have values or be unspecified. Parsers can have up to four different modes for any feature flag. For example, with the validation flag SAX2 implies four kinds of XML parsers:

Optionally validating parsers: The feature flag is read/write and can be either true or false. If it's set to false, few nonfatal errors will be reported and parsing will be a bit faster (maybe 5 or 10 percent of the cost of parsing XML, which is usually negligible to start with).
Nonvalidating parsers: The feature flag is read-only and always false. Some nonfatal errors might be reported (the XML specification demands them in some cases).
Always validating parsers: The feature flag is read-only and always true. Validity errors are always reported as nonfatal. (By default, such errors are ignored; see Section 2.4.2, "Handling Validity Errors" later in this chapter.)
Unknown validation behavior: The feature flag is not recognized, so its value can't be determined. (This mode is uncommon for the SAX2 validation flag, but you'll see it with other feature flags.)

Later in this chapter, look at the feature flags used to characterize namespace processing. Those flags are not optional, so fewer potential parser modes are possible. All the standardized feature flags are detailed in Section 3.3.2, "XMLReader Feature Flags" in Chapter 3, "Producing SAX2 Events".

In SAX, URIs identify feature flags. These are used purely as unique identifiers. This is the same approach used in XML namespaces: don't use these URIs to retrieve data, even if they do look like URLs you could type into a browser. The URI http://xml.org/sax/features/validation identifies the flag-controlling validation.

URIs = URLs + URNs

The use of URIs in XML namespaces has been confusing, and since SAX2 also uses URIs to identify parser feature flags and properties, the same sort of confusion can show up. Think of URIs as names: you can talk about "Fred" even if he's not there, or about "Godot" even if he may not exist, and "the third house on the left" probably makes sense to someone standing at your side.

Classically, a Universal Resource Identifier (URI), is either a Universal Resource Locator (URL) or a Universal Resource Name (URN). Both types of URIs are represented as strings. You're used to seeing URLs in web browsers; they serve as detailed addresses. They often look like http://www.example.com/ but they may use other URI schemes -- for example, they may use https:, ftp: and file:. The scheme indicates the way to access the resource. URNs use URI schemes that start with urn:. You probably have not seen many URNs; one example is urn:uuid:221ffe10-ae3c-11d1-b66c-00805f8a2676. URN schemes (like uuid in this example) describe what the resource is, more than how to access it.

Filenames are never URIs, but you can convert a filename into a URL (hence URI) that works on systems where the original filename was legal. Just to be confusing, there are also "relative URIs," which often look like POSIX-style filenames. Like filenames, relative URIs should never be handed directly to a SAX parser or be used as namespace identifiers.

With XML namespaces and SAX2, the term URI is used to emphasize that the string is being used as a pure identifier: it's more like a URN than a URL, even when the URI is syntactically a URL. It's explicitly irrelevant whether any resource is actually associated with the URI. Don't assume you can fetch resources using those URIs.

To check how a given XML parser handles validation, use code similar to Example 2-5. Code for any other kind of parser feature will look much the same, as long as you use the correct ID for the feature flag; you'll see the same exception types working in the same way. (The same is true for parser "properties," which you'll see in Section 3.3.1, "XMLReader Properties" in Chapter 3, "Producing SAX2 Events".)

Example 2-5. Checking for validation support

XMLReader       producer;
String          uri = "http://xml.org/sax/features/validation";

// ... get the parser

// Try getting and setting the flag
try {
    System.out.println ("Initial validation setting: "
	+ producer.getFeature (uri));
    // if we get here, validation behavior is known

    producer.setFeature (uri, true);
    // if we get here, the parser either validates by
    // default or is optionally validating

} catch (SAXNotSupportedException e) {
    // value not supported; parser is nonvalidating
    System.out.println ("Can't enable validation: "
	+ e.getMessage ());
    System.exit (1);

} catch (SAXNotRecognizedException e) {
    // feature not understood; parser has weak SAX2 support.
    // maybe it's a SAX1 parser inside a ParserAdapter
    System.out.println ("Doesn't understand validation: "
	+ e.getMessage ());
    System.exit (1);
}

As a rule, programs will probably set the validation flag to true only when they really need reports of validity errors. (Why? As we'll see in a moment, it's natural to ignore reports of validity errors when they're not important, so it doesn't much matter if you validate when you don't need to.) The skeleton program in Example 2-1 really just needs a setFeature() call and a small update to the diagnostic message, to be sure it's always validating. (The diagnostics could be more precise using some more-specialized exceptions that we haven't discussed yet.)

// Get an instance of the default XML parser class
try {
    producer = XMLReaderFactory.createXMLReader ();
    producer.setFeature (
	"http://xml.org/sax/features/validation",
	true);
} catch (SAXException e) {
    System.err.println (
	  "Can't get validating parser, check configuration: "
	+ e.getMessage ());
    return;
}

The validation feature flag is probably the most widely used, with the possible exception of the flags controlling namespace handling. Most parsers leave validation off by default to save some minor parsing overhead.

2.4. Producer-Side Validation

Validity and XML

2.4.1. SAX2 Feature Flags

URIs = URLs + URNs

Example 2-5. Checking for validation support

2.4.2. Handling Validity Errors