8.21.3. Discussion
Suppose someone sends you a file in cp1252 format, Microsoft's
default in-house 8-bit character set. Files in this format can be
annoying to read—while they might claim to be Latin1, they are
not, and if you look at them with Latin1 fonts loaded, you'll get
garbage on your screen. A simple solution is as follows:
open(MSMESS, "< :crlf :encoding(cp1252)", $inputfile)
|| die "can't open $inputfile: $!";
Now data read from that handle will be automatically converted into
Unicode when you read it in. It will also be processed in CRLF mode,
which is needed on systems that don't use that sequence to indicate
end of line.
xterm -n unicode -u8 -fn -misc-fixed-medium-r-normal--20-200-75-75-c-100-iso10646-1
But many open questions still exist, such as cutting and pasting of
Unicode data between windows.
The www.unicode.org site has help
for finding and installing suitable tools for a variety of platforms,
including both Unix and Microsoft systems.
You'll also need to tell Perl it's alright to emit Unicode. If you
don't, you'll get a warning about a "Wide character in
print" every time you try. Assuming you're running in an
xterm like the one shown previously (or its
equivalent for your system) that has Unicode fonts available, you
could just do this:
binmode(STDOUT, ":utf8");
But that requires the rest of your program to emit Unicode, which
might not be convenient. When writing new programs specifically
designed for this, though, it might not be too much trouble.
As of v5.8.1, Perl offers a couple of other means of getting this
effect. The -C command-line switch
controls some Unicode features related to your runtime environment.
This way you can set those features on a per-command basis without
having to edit the source code.
You may use letters or numbers. If you use numbers, you have to add
them up. For example, -COE and
-C6 are synonyms of UTF-8 on both
STDOUT and STDERR.