home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  

Book HomeSAX2Search this book

B.8. Character Information Items

Along with element and attribute information items, characters are one of the core types of information used by XML applications. SAX2 reports characters in groups, rather than one at a time.




[character code]

ContentHandler.characters(), ContentHandler.ignorableWhitespace()

These calls provide one or more characters in the UTF-16 encoding. Normally, each Java char is a single [character code], but surrogate pairs are used to encode characters from the "Astral Planes," which don't fit into 16 bits. (No whitespace characters need surrogate pairs.)

[element content whitespace]

When known, this Boolean property is encoded by using the ignorableWhitespace() callback instead of characters(). Most SAX parsers report this property even when they aren't validating, though that's not required. (If any external parameter entities are skipped, it is not possible to reliably provide this information.)


Applications must keep track of this information item if it is needed.

SAX2 permits reporting of a character property that the XML Infoset doesn't address: whether the characters are in a CDATA section. (DOM requires this information.) Such section boundaries are reported using methods in the LexicalHandler class.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.