Event Consumer Issues (SAX2)

B.2. Event Consumer Issues

You really shouldn't care, but since the String datatype can't handle more than two gigabytes of data, and strings are used to pass certain document data to applications, there's a chance that some documents could cause trouble by overflowing that limit. If you encounter such a document, consult a pathologist. There really isn't much you can do about this.

B.2.1. Structural Issues

The [children] properties are arbitrarily sized, ordered sequences of information items, which are presented in document order by SAX2 event callbacks. Most other information items are not ordered, such as [notations], [unparsed entities], and [attributes] properties. Only [children] properties would need to be stored in order-preserving data structures.

While most information items are provided through a single callback, some of the more complex ones involve matched, and (except in one case) cleanly nested, pairs of calls to start() and end() the item. Such items include the Document itself, its Document Type Declaration, Elements, and Namespace Information. To track those items, applications implement some kind of context stack tracking.

The [parent] properties of some information items are implicitly encoded through such SAX2 nested event reports. Except for items that can be direct children of the Document or Document Type Information Items, applications often push stack entries when startElement() is called and pop them when endElement() is called.

The children of Document and Document Type Information Items have curious restrictions: they don't always match the actual text structure. For example, information items for notations and unparsed entities are found in the Document Information Item, but they're textually part of the Document Type; and comments are stripped out of DTDs. You can use more natural structures in your applications if the descriptive Infoset structure seems awkward.

Other complex information items are implicitly decoded from DTD declarations. To track such items, applications must save declarations during DTD processing, to ensure that they can be correlated with information in the body of a document. Examples of such items include [notation] properties for Unparsed Entities and processing instructions, most properties for Unexpanded Entity References, and [references] properties of attributes.

B.2. Event Consumer Issues

B.2.1. Structural Issues

B.2.2. Base URIs, xml:base, and Locator Data