3.4. The EntityResolver Interface
As mentioned earlier, this interface is used when a parser
needs to access and parse external entities in the DTD or
document content.
It is not used to access the document entity itself.
Cases where an EntityResolver should be used include:
Applications that handle documents with DTDs should
plan to use an EntityResolver so they
work robustly in the face of partial network failures,
and so they avoid placing excessive loads on remote servers.
That is, they should try to access local copies of DTD data
even when the document specifies a remote one.
There are many examples of sloppily written applications
that broke when a remote system administrator moved a DTD
file. Examples range from purely informative services like most
RSS feeds to fee-based services like some news syndication
protocols.
You can implement a useful resolver with a data structure as
simple as a hash table that maps identifiers to URIs.
There is normally no reason to have different parsers
use different entity resolvers; documents shouldn't use the
same public or (absolute) system identifiers to denote different
entities. You'll normally just have one resolver, and
it could adaptively cache entities if you like.
More complex catalog facilities may be used by applications
that follow the SGML convention that public identifiers are
Formal Public Identifiers (FPIs). FPIs serve the role
that Universal Resource Names (URNs) serve
for Internet-oriented systems.
Such mappings can also be used with URIs, if the entity text
associated with URIs is as stable as an FPI.
(Such stability is one of the goals of URNs.)
Applications pass objects that implement the
EntityResolver interface to the
XMLReader.setEntityResolver() method.
The parser will then use the resolver with all external
parsed entities.
The
EntityResolver interface has only one method, which can throw a java.io.IOException as well as the org.xml.sax.SAXException most other callbacks throw.
-
InputSource resolveEntity(String publicId,
String systemId)
Parsers invoke this method to map
entity identifiers either to other identifiers or to data
that they will parse.
See the discussion in Section 3.1.2, "The InputSource Class", earlier in this chapter, for information about how the
InputSource interface is used.
If null is returned, then the
parser will resolve the systemId
without additional assistance.
To avoid parsing an entity, return a value that
encapsulates a zero-length text entity.
The systemId will always
be present and will be a fully resolved URI.
The publicId may be
null. If it's not null, it will have been normalized by
mapping sequences of consecutive whitespace characters
to a single space character.
Example 3-3
is an example of a simple resolver that substitutes
for a web-based time service running on the local machine by
interpreting a private URI scheme and
mapping public identifiers to alternative URIs using a dictionary
that's externally maintained somehow. (For example, you might prime a hashtable with the public IDs for the XHTML 1.0, XHMTL 1.1, and DocBook 4.0 XML DTDs to point to local files.) It delegates to another resolver for other cases.
Example 3-3. Entity resolver, with chaining
public class MyResolver implements EntityResolver
{
private EntityResolver next;
private Dictionary map;
// n -- optional resolver to consult on failure
// m -- mapping public ids to preferred URLs
public MyResolver (EntityResolver n, Dictionary m)
{ next = n; map = m; }
InputSource resolveEntity (String publicId, String systemId)
throws SAXException, IOException
{
// magic URL?
if ("http://localhost/xml/date".equals (systemId)) {
InputSource retval = new InputSource (systemId);
Reader date;
date = new InputStringReader (new Date().toString ());
retval.setCharacterStream (date);
return retval;
}
// nonstandard URI scheme?
if (systemId.startsWith ("blob:") {
InputSource retval = new InputSource (systemId);
String key = systemId.substring (5);
byte data [] = Storage.keyToBlob (key);
retval.setInputSource (new ByteArrayInputStream (data));
return retval;
}
// use table to map public id to local URL?
if (map != null && publicId != null) {
String url = (String) map.get (publicId);
if (url != null)
return new InputSource (url);
}
// chain to next resolver?
if (next != null)
return next.resolveEntity (publicId, systemId);
return null;
}
}
Traditionally, public identifiers are mainly used as
keys to find local copies of entities.
In SGML, system identifiers were optional and system-specific,
so public identifiers were sometimes the only ones available.
(XML changed this: system identifiers are mandatory and
are URIs.)
In essence, public identifiers were used in SGML to serve the
role that URNs serve in web-oriented architectures.
An ISO standard for FPIs exists,
and now RFC 3151 (available at http://www.ietf.org/rfc/rfc3151.txt) defines a mapping from FPIs to URNs.
(The FPI is normalized and transformed, then gets
a urn:publicid: prefix.)
When public identifiers are used with XML systems, it's largely
by adopting FPI policies to interoperate with such SGML systems;
however, XML public identifiers don't need to be FPIs.
You may prefer to use URN schemes in newer systems.
If so, be aware that some XML processing engines support only
URLs as system identifiers.
By letting applications interpret public IDs as URNs, SAX offers
more power than some other XML APIs do.
If you want richer catalog-style functionality than the
table mapping shown earlier, look for open source implementations
of the XML version of the OASIS SGML/Open Catalog (SOCAT).
At this time, a specification for such a catalog is a stable draft, still in development; see http://www.oasis.org/committees/entity/ for more information. This specification defines an XML text representation of mappings; the mappings can be significantly more complex than the tabular one shown earlier.
 |  |  | 3.3. Configuring XMLReader Behavior |  | 3.5. Other Kinds of SAX2 Event Producers |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|