home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Book HomeXML in a NutshellSearch this book

5.2. The Encoding Declaration

Every XML document should have an encoding declaration as part of its XML declaration. The encoding declaration tells the parser in which character set the document is written. It's used only when other metadata from outside the file is not available. For example, this XML declaration says that the document uses the character encoding US-ASCII:

<?xml version="1.0" encoding="US-ASCII" standalone="yes"?>

This one states that the document uses the Latin-1 character set, though it uses the more official name ISO-8859-1:

<?xml version="1.0" encoding="ISO-8859-1"?>

Even if metadata is not available, the encoding declaration can be omitted if the document is written in either the UTF-8 or UTF-16 encodings of Unicode. UTF-8 is a strict superset of ASCII, so ASCII files can be legal XML documents without an encoding declaration. Note, however, that this only applies to genuine, pure 7-bit ASCII files. It does not include the extended ASCII character sets that some editors produce with characters like ©, ç, or ".

Even if character-set metadata is available, many parsers ignore it. Thus, we highly recommend including an encoding declaration in all your XML documents that are not written in UTF-8 or UTF-16. It certainly never hurts to do so.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.