26.2. HTML4 Entity Sets
HTML
4.0
predefines several hundred named entities
for use in your documents, many of which are quite useful. For
instance, the nonbreaking space is .
XML, however, defines only five named entities:
- &
-
The ampersand (&)
- <
-
The less-than sign
(<)
- >
-
The greater-than sign
(>)
- "
-
The straight double quote
(")
- '
-
The apostrophe
(')
Other needed characters can be inserted with character references in
decimal or hexadecimal format. For instance, the nonbreaking space is
Unicode character 160 (decimal). Therefore, you can insert it in your
document as either   or
 . If you really want to type it as
, you can define this entity reference
in your DTD. Doing so requires you to use a character reference:
<!ENTITY nbsp " ">
The XHTML 1.0 specification includes three DTD
fragments that define the familiar HTML character references:
- Latin-1 characters (http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent)
-
The non-ASCII, graphic characters included
in ISO-8859-1 from code points 160 through 255, shown in Figure 26-3
- Special characters (http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent)
-
A few useful letters and punctuation marks
not included in Latin-1
- Symbols (http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent)
-
The Greek
alphabet, plus various arrows, mathematical operators, and other
symbols used in mathematics
Feel free to borrow these entity sets for your own use. They should
be included in your document's DTD with these
parameter entity references and PUBLIC identifiers:
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;
<!ENTITY % HTMLspecial PUBLIC
"-//W3C//ENTITIES Special for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent">
%HTMLspecial;
<!ENTITY % HTMLsymbol PUBLIC
"-//W3C//ENTITIES Symbols for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent">
%HTMLsymbol;
However, we do recommend saving local copies and changing the system
identifier to match the new location, rather than downloading them
from the http://www.w3.org every
time you need to parse a file. You may import just one, two, or all
three of them, depending on what you need. There are no
interdependencies.
Alternatively, just use the character references given in Table 26-4 through Table 26-6.
Table 26-4. The HTML Latin-1 entity set
Character
|
Meaning
|
XHTMLentity reference
|
Hexadecimalcharacter reference
|
Decimalcharacter reference
|
|
Nonbreaking space
|
|
 
|
 
|
¡
|
Inverted exclamation mark
|
¡
|
¡
|
¡
|
¢
|
Cent sign
|
¢
|
¢
|
¢
|
£
|
Pound sign
|
£
|
£
|
£
|
¤
|
Currency sign
|
¤
|
¤
|
¤
|
¥
|
Yen sign, Yuan sign
|
¥
|
¥
|
¥
|
|
|
Broken vertical bar
|
¦
|
¦
|
¦
|
§
|
Section sign
|
§
|
§
|
§
|
|
Dieresis, spacing dieresis
|
¨
|
¨
|
¨
|
©
|
Copyright sign
|
©
|
©
|
©
|
ª
|
Feminine ordinal indicator
|
ª
|
ª
|
ª
|
«
|
Left-pointing double angle quotation mark, left-pointing guillemot
|
«
|
«
|
«
|
¬
|
Not sign
|
¬
|
¬
|
¬
|
-
|
Soft hyphen, discretionary hyphen
|
­
|
­
|
­
|
®
|
Registered trademark sign
|
®
|
®
|
®
|
¯
|
Macron, overline, APL overbar
|
¯
|
¯
|
¯
|
°
|
Degree sign
|
°
|
°
|
°
|
±
|
Plus-or-minus sign
|
±
|
±
|
±
|
2
|
Superscript digit two, squared
|
²
|
²
|
²
|
3
|
Superscript digit three, cubed
|
³
|
³
|
³
|
´
|
Acute accent, spacing acute
|
´
|
´
|
´
|
µ
|
Micro sign
|
µ
|
µ
|
µ
|
¶
|
Pilcrow sign, paragraph sign
|
¶
|
¶
|
¶
|
|
Middle dot, Georgian comma, Greek middle dot
|
·
|
·
|
·
|
¸
|
Cedilla, spacing cedilla
|
¸
|
¸
|
¸
|
1
|
Superscript digit one
|
¹
|
¹
|
¹
|
º
|
Masculine ordinal indicator
|
º
|
º
|
º
|
»
|
Right-pointing double angle quotation mark, right-pointing guillemot
|
»
|
»
|
»
|
1/4
|
Vulgar fraction one-quarter
|
¼
|
¼
|
¼
|
1/2
|
Vulgar fraction one-half
|
½
|
½
|
½
|
3/4
|
Vulgar fraction three-quarters
|
¾
|
¾
|
¾
|
¿
|
Inverted question mark
|
¿
|
¿
|
¿
|
À
|
Latin capital letter A with grave
|
À
|
À
|
À
|
Á
|
Latin capital letter A with acute
|
Á
|
Á
|
Á
|
Â
|
Latin capital letter A with circumflex
|
Â
|
Â
|
Â
|
Ã
|
Latin capital letter A with tilde
|
Ã
|
Ã
|
Ã
|
Ä
|
Latin capital letter A with dieresis
|
Ä
|
Ä
|
Ä
|
Å
|
Latin capital letter A with ring above, Latin capital letter A ring
|
Å
|
Å
|
Å
|
Æ
|
Latin capital letter AE, Latin capital ligature AE
|
Æ
|
Æ
|
Æ
|
Ç
|
Latin capital letter C with cedilla
|
Ç
|
Ç
|
Ç
|
È
|
Latin capital letter E with grave
|
È
|
È
|
È
|
É
|
Latin capital letter E with acute
|
É
|
É
|
É
|
Ê
|
Latin capital letter E with circumflex
|
Ê
|
Ê
|
Ê
|
Ë
|
Latin capital letter E with dieresis
|
Ë
|
Ë
|
Ë
|
Ì
|
Latin capital letter I with grave
|
Ì
|
Ì
|
Ì
|
Í
|
Latin capital letter I with acute
|
Í
|
Í
|
Í
|
Î
|
Latin capital letter I with circumflex
|
Î
|
Î
|
Î
|
Ï
|
Latin capital letter I with dieresis
|
Ï
|
Ï
|
Ï
|
|
Latin capital letter eth
|
Ð
|
Ð
|
Ð
|
Ñ
|
Latin capital letter N with tilde
|
Ñ
|
Ñ
|
Ñ
|
Ò
|
Latin capital letter O with grave
|
Ò
|
Ò
|
Ò
|
Ó
|
Latin capital letter O with acute
|
Ó
|
Ó
|
Ó
|
Ô
|
Latin capital letter O with circumflex
|
Ô
|
Ô
|
Ô
|
Õ
|
Latin capital letter O with tilde
|
Õ
|
Õ
|
Õ
|
Ö
|
Latin capital letter O with dieresis
|
Ö
|
Ö
|
Ö
|
x
|
Multiplication sign
|
×
|
×
|
×
|
Ø
|
Latin capital letter O with stroke
|
Ø
|
Ø
|
Ø
|
Ù
|
Latin capital letter U with grave
|
Ù
|
Ù
|
Ù
|
Ú
|
Latin capital letter U with acute
|
Ú
|
Ú
|
Ú
|
Û
|
Latin capital letter U with circumflex
|
Û
|
Û
|
Û
|
Ü
|
Latin capital letter U with dieresis
|
Ü
|
Ü
|
Ü
|
|
Latin capital letter Y with acute
|
Ý
|
Ý
|
Ý
|
|
Latin capital letter thorn
|
Þ
|
Þ
|
Þ
|
ß
|
Latin small letter sharp s, ess-zett
|
ß
|
ß
|
ß
|
à
|
Latin small letter a with grave
|
à
|
à
|
à
|
á
|
Latin small letter a with acute
|
á
|
á
|
á
|
â
|
Latin small letter a with circumflex
|
â
|
â
|
â
|
ã
|
Latin small letter a with tilde
|
ã
|
ã
|
ã
|
ä
|
Latin small letter a with dieresis
|
ä
|
ä
|
ä
|
å
|
Latin small letter a with ring above
|
å
|
å
|
å
|
æ
|
Latin small letter ae, Latin small ligature ae
|
æ
|
æ
|
æ
|
ç
|
Latin small letter c with cedilla
|
ç
|
ç
|
ç
|
è
|
Latin small letter e with grave
|
è
|
è
|
è
|
é
|
Latin small letter e with acute
|
é
|
é
|
é
|
ê
|
Latin small letter e with circumflex
|
ê
|
ê
|
ê
|
ë
|
Latin small letter e with dieresis
|
ë
|
ë
|
ë
|
ì
|
Latin small letter i with grave
|
ì
|
ì
|
ì
|
í
|
Latin small letter i with acute
|
í
|
í
|
í
|
î
|
Latin small letter i with circumflex
|
î
|
î
|
î
|
ï
|
Latin small letter i with dieresis
|
ï
|
ï
|
ï
|
|
Latin small letter eth
|
ð
|
ð
|
ð
|
ñ
|
Latin small letter n with tilde
|
ñ
|
ñ
|
ñ
|
ò
|
Latin small letter o with grave
|
ò
|
ò
|
ò
|
ó
|
Latin small letter o with acute
|
ó
|
ó
|
ó
|
ô
|
Latin small letter o with circumflex
|
ô
|
ô
|
ô
|
õ
|
Latin small letter o with tilde
|
õ
|
õ
|
õ
|
ö
|
Latin small letter o with dieresis
|
ö
|
ö
|
ö
|
÷
|
Division sign
|
÷
|
÷
|
÷
|
ø
|
Latin small letter o with stroke
|
ø
|
ø
|
ø
|
ù
|
Latin small letter u with grave
|
ù
|
ù
|
ù
|
ú
|
Latin small letter u with acute
|
ú
|
ú
|
ú
|
û
|
Latin small letter u with circumflex
|
û
|
û
|
û
|
ü
|
Latin small letter u with dieresis
|
ü
|
ü
|
ü
|
|
Latin small letter y with acute
|
ý
|
ý
|
ý
|
|
Latin small letter thorn
|
þ
|
þ
|
þ
|
ÿ
|
Latin small letter y with dieresis
|
ÿ
|
ÿ
|
ÿ
|
Table 26-5. The HTML special characters entity set
Character
|
Meaning
|
XHTML entity reference
|
Hexadecimal character reference
|
Decimal character reference
|
"
|
Quotation mark, APL quote
|
"
|
"
|
"
|
&
|
Ampersand
|
&
|
&
|
&
|
'
|
Apostrophe mark
|
'
|
'
|
'
|
<
|
Less-than sign
|
<
|
<
|
<
|
>
|
Greater-than sign
|
>
|
>
|
>
|
|
Latin capital ligature OE
|
Œ
|
Œ
|
Œ
|
|
Latin small ligature oe
|
œ
|
œ
|
œ
|
|
Latin capital letter S with caron
|
Š
|
Š
|
Š
|
|
Latin small letter s with caron
|
š
|
š
|
š
|
|
Latin capital letter Y with dieresis
|
Ÿ
|
Ÿ
|
Ÿ
|
~
|
Modifier letter circumflex accent
|
ˆ
|
ˆ
|
ˆ
|
~
|
Small tilde
|
˜
|
˜
|
˜
|
|
En space
|
 
|
 
|
 
|
|
Em space
|
 
|
 
|
 
|
|
Thin space
|
 
|
 
|
 
|
Nonprinting character
|
Zero width nonjoiner
|
‌
|
‌
|
‌
|
Nonprinting character
|
Zero width joiner
|
‍
|
‍
|
‍
|
Nonprinting character
|
Left-to-right mark
|
‎
|
‎
|
‎
|
Nonprinting character
|
Right-to-left mark
|
‏
|
‏
|
‏
|
-
|
En dash
|
–
|
–
|
–
|
--
|
Em dash
|
—
|
—
|
—
|
`
|
Left single quotation mark
|
‘
|
‘
|
‘
|
'
|
Right single quotation mark
|
’
|
’
|
’
|
,
|
Single low-9 quotation mark
|
‚
|
‚
|
‚
|
"
|
Left double quotation mark
|
“
|
“
|
“
|
"
|
Right double quotation mark
|
”
|
”
|
”
|
|
Double low-9 quotation mark
|
„
|
„
|
„
|
|
Dagger
|
†
|
†
|
†
|
|
Double dagger
|
‡
|
‡
|
‡
|
|
Per mille sign
|
‰
|
‰
|
‰
|
|
Single left-pointing angle quotation mark
|
‹
|
‹
|
‹
|
|
Single right-pointing angle quotation mark
|
›
|
›
|
›
|
|
Euro sign
|
€
|
€
|
€
|
Table 26-6. The HTML symbol entity set
Character
|
Meaning
|
XHTML entity reference
|
Hexadecimal character reference
|
Decimal character reference
|
|
Latin small f with hook, function, florin
|
ƒ
|
ƒ
|
ƒ
|
A
|
Greek capital letter alpha
|
Α
|
Α
|
Α
|
B
|
Greek capital letter beta
|
Β
|
Β
|
Β
|
|
Greek capital letter gamma
|
Γ
|
Γ
|
Γ
|
|
Greek capital letter delta
|
Δ
|
Δ
|
Δ
|
E
|
Greek capital letter epsilon
|
Ε
|
Ε
|
Ε
|
Z
|
Greek capital letter zeta
|
Ζ
|
Ζ
|
Ζ
|
H
|
Greek capital letter eta
|
Η
|
Η
|
Η
|
|
Greek capital letter theta
|
Θ
|
Θ
|
Θ
|
I
|
Greek capital letter iota
|
Ι
|
Ι
|
Ι
|
K
|
Greek capital letter kappa
|
Κ
|
Κ
|
Κ
|
|
Greek capital letter lambda
|
Λ
|
Λ
|
Λ
|
M
|
Greek capital letter mu
|
Μ
|
Μ
|
Μ
|
N
|
Greek capital letter nu
|
Ν
|
Ν
|
Ν
|
|
Greek capital letter xi
|
Ξ
|
Ξ
|
Ξ
|
O
|
Greek capital letter omicron
|
Ο
|
Ο
|
Ο
|
|
Greek capital letter pi
|
Π
|
Π
|
Π
|
|
Greek capital letter rho
|
Ρ
|
Ρ
|
Ρ
|
|
Greek capital letter sigma
|
Σ
|
Σ
|
Σ
|
T
|
Greek capital letter tau
|
Τ
|
Τ
|
Τ
|
|
Greek capital letter upsilon
|
Υ
|
Υ
|
Υ
|
|
Greek capital letter phi
|
Φ
|
Φ
|
Φ
|
|
Greek capital letter chi
|
Χ
|
Χ
|
Χ
|
|
Greek capital letter psi
|
Ψ
|
Ψ
|
Ψ
|
|
Greek capital letter omega
|
Ω
|
Ω
|
Ω
|
|
Greek small letter alpha
|
α
|
α
|
α
|
|
Greek small letter beta
|
β
|
β
|
β
|
|
Greek small letter gamma
|
γ
|
γ
|
γ
|
|
Greek small letter delta
|
δ
|
δ
|
δ
|
|
Greek small letter epsilon
|
ε
|
ε
|
ε
|
|
Greek small letter zeta
|
ζ
|
ζ
|
ζ
|
|
Greek small letter eta
|
η
|
η
|
η
|
|
Greek small letter theta
|
θ
|
θ
|
θ
|
|
Greek small letter iota
|
ι
|
ι
|
ι
|
|
Greek small letter kappa
|
κ
|
κ
|
κ
|
|
Greek small letter lambda
|
λ
|
λ
|
λ
|
μ
|
Greek small letter mu
|
μ
|
μ
|
μ
|
|
Greek small letter nu
|
ν
|
ν
|
ν
|
|
Greek small letter xi
|
ξ
|
ξ
|
ξ
|
|
Greek small letter omicron
|
ο
|
ο
|
ο
|
|
Greek small letter pi
|
π
|
π
|
π
|
|
Greek small letter rho
|
ρ
|
ρ
|
ρ
|
|
Greek small letter final sigma
|
ς
|
ς
|
ς
|
|
Greek small letter sigma
|
σ
|
σ
|
σ
|
|
Greek small letter tau
|
τ
|
τ
|
τ
|
|
Greek small letter upsilon
|
υ
|
υ
|
υ
|
|
Greek small letter phi
|
φ
|
φ
|
φ
|
|
Greek small letter chi
|
χ
|
χ
|
χ
|
|
Greek small letter psi
|
ψ
|
ψ
|
ψ
|
|
Greek small letter omega
|
ω
|
ω
|
ω
|
|
Greek small letter theta symbol
|
ϑ
|
ϑ
|
ϑ
|
|
Greek upsilon with hook symbol
|
ϒ
|
ϒ
|
ϒ
|
|
Greek pi symbol
|
ϖ
|
ϖ
|
ϖ
|
·
|
Bullet, black small circle
|
•
|
•
|
•
|
...
|
Horizontal ellipsis, three-dot leader
|
…
|
…
|
…
|
´
|
Prime, minutes, feet
|
′
|
′
|
′
|
|
Double prime, seconds, inches
|
″
|
″
|
″
|
¯
|
Overline, spacing overscore
|
‾
|
‾
|
‾
|
/
|
Fraction slash
|
⁄
|
⁄
|
⁄
|
|
Black letter capital I, imaginary part
|
ℑ
|
ℑ
|
ℑ
|
|
Script capital P, power set, Weierstrass p
|
℘
|
℘
|
℘
|
|
Black letter capital R, real part symbol
|
ℜ
|
ℜ
|
ℜ
|
™
|
Trademark sign
|
™
|
™
|
™
|
|
Aleph symbol, first transfinite cardinal
|
ℵ
|
ℵ
|
ℵ
|
|
Leftward arrow
|
←
|
←
|
←
|
|
Upward arrow
|
↑
|
↑
|
↑
|
|
Rightward arrow
|
→
|
→
|
→
|
|
Downward arrow
|
↓
|
↓
|
↓
|
|
Left-right arrow
|
↔
|
↔
|
↔
|
|
Downward arrow with corner leftward, carriage return
|
↵
|
↵
|
↵
|
|
Leftward double arrow
|
⇐
|
⇐
|
⇐
|
|
Upward double arrow
|
⇑
|
⇑
|
⇑
|
|
Rightward double arrow
|
⇒
|
⇒
|
⇒
|
|
Downward double arrow
|
⇓
|
⇓
|
⇓
|
|
Left-right double arrow
|
⇔
|
⇔
|
⇔
|
|
For all
|
∀
|
∀
|
∀
|
|
Partial differential
|
∂
|
∂
|
∂
|
|
There exists
|
∃
|
∃
|
∃
|
|
Empty set, null set, diameter
|
∅
|
∅
|
∅
|
|
Nabla, backward difference
|
∇
|
∇
|
∇
|
|
Element of
|
∈
|
∈
|
∈
|
|
Not an element of
|
∉
|
∉
|
∉
|
|
Contains as member
|
∋
|
∋
|
∋
|
|
N-ary product, product sign
|
∏
|
∏
|
∏
|
|
N-ary summation
|
∑
|
∑
|
∑
|
-
|
Minus sign
|
−
|
−
|
−
|
*
|
Asterisk operator
|
∗
|
∗
|
∗
|
|
Square root, radical sign
|
√
|
√
|
√
|
|
Proportional to
|
∝
|
∝
|
∝
|
|
Infinity
|
∞
|
∞
|
∞
|
|
Angle
|
∠
|
∠
|
∠
|
|
Logical and, wedge
|
∧
|
∧
|
∧
|
|
Logical or, vee
|
∨
|
∨
|
∨
|
|
Intersection, cap
|
∩
|
∩
|
∩
|
|
Union, cup
|
∪
|
∪
|
∪
|
|
Integral
|
∫
|
∫
|
∫
|
|
Therefore
|
∴
|
∴
|
∴
|
~
|
Tilde operator, varies with, similar to
|
∼
|
∼
|
∼
|
|
Approximately equal to
|
≅
|
≅
|
≅
|
|
Almost equal to, asymptotic to
|
≈
|
≈
|
≈
|
|
Not equal to
|
≠
|
≠
|
≠
|
|
Identical to
|
≡
|
≡
|
≡
|
|
Less than or equal to
|
≤
|
≤
|
≤
|
|
Greater than or equal to
|
≥
|
≥
|
≥
|
|
Subset of
|
⊂
|
⊂
|
⊂
|
|
Superset of
|
⊃
|
⊃
|
⊃
|
|
Not a subset of
|
⊄
|
⊄
|
⊄
|
|
Subset of or equal to
|
⊆
|
⊆
|
⊆
|
|
Superset of or equal to
|
⊇
|
⊇
|
⊇
|
|
Circled plus, direct sum
|
⊕
|
⊕
|
⊕
|
|
Circled times, vector product
|
⊗
|
⊗
|
⊗
|
|
Up tack, orthogonal to, perpendicular
|
⊥
|
⊥
|
⊥
|
|
Dot operator
|
⋅
|
⋅
|
⋅
|
|
Left ceiling, APL upstile
|
⌈
|
⌈
|
⌈
|
|
Right ceiling
|
⌉
|
⌉
|
⌉
|
|
Left floor, APL downstile
|
⌊
|
⌊
|
⌊
|
|
Right floor
|
⌋
|
⌋
|
⌋
|
|
Left-pointing angle bracket, bra
|
⟨
|
〈
|
〈
|
|
Right-pointing angle bracket, ket
|
⟩
|
〉
|
〉
|
|
Lozenge
|
◊
|
◊
|
◊
|
|
Black spade suit
|
♠
|
♠
|
♠
|
|
Black club suit, shamrock
|
♣
|
♣
|
♣
|
|
Black heart suit, valentine
|
♥
|
♥
|
♥
|
|
Black diamond suit
|
♦
|
♦
|
♦
|
| | | 26. Character Sets | | 26.3. Other Unicode Blocks |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|