B.3. Document Information Item
The Document Information Item is the root
of the information found in an XML document.
There is only one such root item.
This information item begins with the
ContentHandler.startDocument() call and
ends with the ContentHandler.endDocument()
call. Many SAX2 event calls are used to construct its children
or constituents.
Property |
Callbacks |
Explanation |
[children] |
|
See the sections for each type of Information
Item: Document Type Declaration (one, if present),
Element (one),
processing instruction (possibly many),
Comment (possibly many).
|
[document element] |
|
This is the element in the [children] property.
|
[notations] |
|
See the section on Notation Information Items.
(Unordered.) |
[unparsed entities] |
|
See the section on Unparsed Entity
Information Items.
(Unordered.) |
[base URI] |
Locator.getSystemId(), or XMLReader.parse() |
Locator may be used
during the startDocument() callback
(and earlier callbacks, unless they were made in the
context of an external parameter entity).
Alternatively, for any parsers that don't
provide a Locator,
applications using an XMLReader
are responsible for providing this information (if it
exists) to the parse() method.
This is passed directly as the string parameter
or indirectly as the systemId
property of an InputSource.
|
[character encoding scheme] |
unavailable; or
InputSource.getEncoding()
|
Normally this property is unavailable; it won't
affect the interpretation of character data in Java.
However, applications will in rare
cases provide this to the parser when they call
XMLReader.parse(InputSource)
to start parsing.
It's likely that an upcoming extension API
will provide this information.
|
[standalone] |
XMLReader.getFeature()
|
It's likely that an upcoming
extension API will provide this information
using an is-standalone feature flag.
|
[version] |
unavailable |
You can probably assume the value of this
property is "1.0" for now.
It's likely that an upcoming extension API
will provide this information.
|
[all declarations processed] |
ContentHandler.skippedEntity():
LexicalHandler.endDTD()
|
When endDTD() is invoked, the
value of this property is known. If no external
parameter entities
are reported as skipped, then the value is true.
If the parser doesn't support the lexical handler, then
the later call to startElement()
may be used instead of endDTD().
|
Because text in Java is always accessed using UTF-16
character strings or arrays,
most applications won't need to worry about encoding issues;
the SAX2 parser handles that. However, there are cases when
encoding may matter:
- Input normalization
Some recent XML standards require that
text be normalized.
For example, XML Canonicalization (as used in digital
signature applications) requires the use of Unicode
Normalization Form C; some other W3C specifications
have the same requirement.
Text originally represented in UTF-8 or UTF-16
might need further normalization to remove some
deprecated character codes that can be represented
using those encodings.
Such encoding data is required on a per-entity basis,
not a per-document basis as implied by the Infoset specification. And for internal entity expansions or defaulted attributes, you'll need to normalize if the encoding associated with the original definition supported denormalized text.
- Output encoding
When using an output encoding that is not
based on the Unicode character set, you may not be able
to represent XML names that use particular characters.
For example, ASCII cannot handle element or attribute names
using accented characters (used in Europe and Latin
America) or using ideographic characters (used in Asia).
The preferred encoding solution is to always use UTF-8
or UTF-16 when outputting XML, so that such problems cannot
occur and so that all XML processors can work with such
output. Similar logic applies to display systems like
window systems: prefer font rendering systems that use
Unicode over those tied to some specific encoding.
| | | B.2. Event Consumer Issues | | B.4. Element Information Items |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|