Appendix E. Common Content EncodingsIn an ideal world, the only character encoding (or, loosely,
"character set") that
you'd ever see would be UTF-8
(utf-8), and Latin-1
(iso-8859-1) for all those legacy documents.
However, the encodings mentioned below exist and can be found on the
Web. They are listed below in order of their English names, with the
lefthand side being the value you'd get returned
from $response->content_charset. The complete
list of character sets can be found at http://www.iana.org/assignments/character-sets.
Value
|
Encoding
|
us-ascii
|
ASCII plain (just characters 0x00-0x7F)
|
asmo-708
|
Arabic ASMO-708
|
iso-8859-6
|
Arabic ISO
|
dos-720
|
Arabic MSDOS
|
windows-1256
|
Arabic MSWindows
|
iso-8859-4
|
Baltic ISO
|
windows-1257
|
Baltic MSWindows
|
iso-8859-2
|
Central European ISO
|
ibm852
|
Central European MSDOS
|
windows-1250
|
Central European MSWindows
|
hz-gb-2312
|
Chinese Simplified (HZ)
|
gb2312
|
Chinese Simplified (GB2312)
|
euc-cn
|
Chinese Simplified EUC
|
big5
|
Chinese Traditional (Big5)
|
cp866
|
Cyrillic DOS
|
iso-8859-5
|
Cyrillic ISO
|
koi8-r
|
Cyrillic KOI8-R
|
koi8-u
|
Cyrillic KOI8-U
|
windows-1251
|
Cyrillic MSWindows
|
iso-8859-7
|
Greek ISO
|
windows-1253
|
Greek MSWindows
|
iso-8859-8-i
|
Hebrew ISO Logical
|
iso-8859-8
|
Hebrew ISO Visual
|
dos-862
|
Hebrew MSDOS
|
windows-1255
|
Hebrew MSWindows
|
euc-jp
|
Japanese EUC-JP
|
iso-2022-jp
|
Japanese JIS
|
shift_jis
|
Japanese Shift-JIS
|
iso-2022-kr
|
Korean ISO
|
euc-kr
|
Korean Standard
|
windows-874
|
Thai MSWindows
|
iso-8859-9
|
Turkish ISO
|
windows-1254
|
Turkish MSWindows
|
utf-8
|
Unicode expressed as UTF-8
|
utf-16
|
Unicode expressed as UTF-16
|
windows-1258
|
Vietnamese MSWindows
|
viscii
|
Vietnamese VISCII
|
iso-8859-1
|
Western European (Latin-1)
|
windows-1252
|
Western European (Latin-1) with extra characters in 0x80-0x9F
|
| | | D. Language Tags | | F. ASCII Table |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|