15.3. Caution, Working
As of this writing (that is, with respect to version 5.6.0 of Perl),
there are still some caveats on use of Unicode. (Check your online docs
for updates.)
-
The existing regular expression compiler does not
produce polymorphic opcodes. This means that the determination of
whether a particular pattern will match Unicode characters is made
when the pattern is compiled (based on whether the pattern contains
Unicode characters) and not when the matching happens at run
time. This needs to be changed to adaptively match Unicode if the string to
be matched is Unicode.
-
There is
currently no easy way to mark data read from a file or other external
source as being utf8. This will be a major area of focus in the near
future and is probably already fixed as you read this.
-
There is no method for
automatically coercing input and output to some encoding other than
UTF-8. This is planned in the near future, however, so check your
online docs.
-
Use of locales with
utf8 may lead to odd results. Currently, there is some attempt to
apply 8-bit locale information to characters in the range
0..255, but this is demonstrably incorrect for
locales that use characters above that range (when mapped into
Unicode). It will also tend to run slower. Avoidance of locales is
strongly encouraged.
Unicode is fun--you just have to define fun correctly.
| | |
15.2. Effects of Character Semantics | | 16. Interprocess Communication |
Copyright © 2002 O'Reilly & Associates. All rights reserved.
|
|