<maxim>
σοφός
έαυτόν
γιγνώσκει
</maxim>
To the XML processor, a document using character entity references
referring to Unicode characters that don't exist in
the current encoding is equivalent to a Unicode document in which all
character references are replaced by the actual characters to which
they refer. In other words, this XML document is the same as the
previous one:
<maxim>


Ó



Ó







</maxim>
Character references may be used in element content, attribute
values, and comments. They may not be used in element and attribute
names, processing instruction targets, or XML keywords, such as
DOCTYPE or ELEMENT. They may be
used in the DTD in attribute default values and entity replacement
text. Tag and attribute names may be written in languages such as
Greek, Russian, Arabic, or Chinese, but you must use a character set
that allows you to include the appropriate characters natively. You
can't insert these characters with character
references. For instance, this is well-formed:
<



>
σοφός
<



>
This is not well-formed:
<λογος>
σοφός
</λογος>
Since these are fairly standard DTDs, they have both Public IDs and
URLs. Other groups and individuals have written entity sets you can
use similarly, though no canonical collection of entity sets that
covers all of Unicode exists. SGML included almost 20 separate entity
sets covering Greek, Cyrillic, extended Latin, mathematical symbols,
diacritical marks, box-drawing characters, and publishing marks.
These aren't a standard part of XML, but several
applications including DocBook (http://www.docbook.org/) and MathML
(http://www.w3.org/TR/MathML2/chapter6.html#chars_entity-tables)
have ported them to XML.
MathML also has several useful entity sets
containing more mathematical symbols.