Appendix E. Charsets
Table E-1lists the suggested charset(s) for a
number of languages. Charsets are used by servlets that generate
multilingual output; they determine which character encoding a
servlet's PrintWriter is to use. By default,
the PrintWriter uses the ISO-8859-1 (Latin-1)
charset, appropriate for most Western European languages. To specify
an alternate charset, the charset value must be passed to the
setContentType() method before the servlet
retrieves its PrintWriter. For example:
res.setContentType("text/html; charset=Shift_JIS"); // A Japanese charset
PrintWriter out = res.getWriter(); // Writes Shift_JIS Japanese
Note that not all web browsers support all charsets or have the fonts
available to represent all characters, although at minimum all
clients support ISO-8859-1. Also, the UTF-8 charset can represent all
Unicode characters and may be assumed a viable alternative for all
languages.
Table E-1. Suggested Charsets
Language
|
Language Code
|
Suggested Charsets
|
Albanian
|
sq
|
ISO-8859-2
|
Arabic
|
ar
|
ISO-8859-6
|
Bulgarian
|
bg
|
ISO-8859-5
|
Byelorussian
|
be
|
ISO-8859-5
|
Catalan (Spanish)
|
ca
|
ISO-8859-1
|
Chinese (Simplified/Mainland)
|
zh
|
GB2312
|
Chinese (Traditional/Taiwan)
|
zh (country TW)
|
Big5
|
Croatian
|
hr
|
ISO-8859-2
|
Czech
|
cs
|
ISO-8859-2
|
Danish
|
da
|
ISO-8859-1
|
Dutch
|
nl
|
ISO-8859-1
|
English
|
en
|
ISO-8859-1
|
Estonian
|
et
|
ISO-8859-1
|
Finnish
|
fi
|
ISO-8859-1
|
French
|
fr
|
ISO-8859-1
|
German
|
de
|
ISO-8859-1
|
Greek
|
el
|
ISO-8859-7
|
Hebrew
|
he (formerly iw)
|
ISO-8859-8
|
Hungarian
|
hu
|
ISO-8859-2
|
Icelandic
|
is
|
ISO-8859-1
|
Italian
|
it
|
ISO-8859-1
|
Japanese
|
ja
|
Shift_JIS, ISO-2022-JP, EUC-JP[1]
|
Korean
|
ko
|
EUC-KR[2]
|
Latvian, Lettish
|
lv
|
ISO-8859-2
|
Lithuanian
|
lt
|
ISO-8859-2
|
Macedonian
|
mk
|
ISO-8859-5
|
Norwegian
|
no
|
ISO-8859-1
|
Polish
|
pl
|
ISO-8859-2
|
Portuguese
|
pt
|
ISO-8859-1
|
Romanian
|
ro
|
ISO-8859-2
|
Russian
|
ru
|
ISO-8859-5, KOI8-R
|
Serbian
|
sr
|
ISO-8859-5, KOI8-R
|
Serbo-Croatian
|
sh
|
ISO-8859-5, ISO-8859-2, KOI8-R
|
Slovak
|
sk
|
ISO-8859-2
|
Slovenian
|
sl
|
ISO-8859-2
|
Spanish
|
es
|
ISO-8859-1
|
Swedish
|
sv
|
ISO-8859-1
|
Turkish
|
tr
|
ISO-8859-9
|
Ukranian
|
uk
|
ISO-8859-5, KOI8-R
|
| | |
Appendix D. Character Entities | | |
Copyright © 2001 O'Reilly & Associates. All rights reserved.
|
|