Storing documents in multiple files is convenient, especially for
really large documents. For example, suppose you have a big book to
write in XML and you want to store each chapter in its own file. You
can do so easily with external entities. Here's an
example:
<?xml version="1.0"?>
<doctype book [
<!ENTITY intro-chapter SYSTEM "chapters/intro.xml">
<!ENTITY pasta-chapter SYSTEM "chapters/pasta.xml">
<!ENTITY stirfry-chapter SYSTEM "chapters/stirfry.xml">
<!ENTITY soups-chapter SYSTEM "chapters/soups.xml"> ]>
<book>
<title>The Bonehead Cookbook</title>
&intro-chapter;
&pasta-chapter;
&stirfry-chapter;
&soups-chapter;
</book>
The previous filter example would resolve the external entity
references for you diligently and output the entire book in one
piece. Your file separation scheme would be lost and
you'd have to edit the resulting file to break it
back into multiple files. Fortunately, we can override the resolution
of external entity references using a handler called
resolve_entity( ).
This handler has four properties: Name, the
entity's name; SystemId and
PublicId, identifiers that help you locate the
file containing the entity's text; and
Base, which helps resolve relative URLs, if any
exist. Unlike the other handlers, this one should return a value to
tell the parser what to do. Returning undef tells
the parser to load the external entity as it normally would.
Otherwise, you need to return a hash describing an alternative source
from which the entity should be loaded. The hash is the same type you
would use to give to the object's parse(
) method, with keys like SystemId to
give it a filename or URL, or String to give it a
string of text. For example:
sub resolve_entity {
my( $self, $props ) = @_;
if( exists( $props->{ SystemId }) and
open( ENT, $props->{ SystemId })) {
my $entval = '<?start-file ' . $props->{ SystemId } . '?>';
while( <ENT> ) { $entval .= $_; }
close ENT;
$entval .= '<?end-file ' . $props->{ SystemId } . '?>';
return { String => $entval };
} else {
return undef;
}
}
This routine opens the entity resource, if it's in a
file it can find, and gives it to the parser as a string. First, it
attaches a processing instruction before and after the entity text,
marking the boundary of the file. Later, you can write a routine to
look for the PIs and separate the files back out again.
 |  |  |
5.2. DTD Handlers |  | 5.4. Drivers for Non-XML Sources |