home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Perl CookbookPerl CookbookSearch this book

1.5. Using Named Unicode Characters

1.5.2. Solution

Place a use charnames at the top of your file, then freely insert "\N{CHARSPEC}" escapes into your string literals.

1.5.3. Discussion

The use charnames pragma lets you use symbolic names for Unicode characters. These are compile-time constants that you access with the \N{CHARSPEC} double-quoted string sequence. Several subpragmas are supported. The :full subpragma grants access to the full range of character names, but you have to write them out in full, exactly as they occur in the Unicode character database, including the loud, all-capitals notation. The :short subpragma gives convenient shortcuts. Any import without a colon tag is taken to be a script name, giving case-sensitive shortcuts for those scripts.

use charnames ':full';
print "\N{GREEK CAPITAL LETTER DELTA} is called delta.\n";

Δ is called delta.

use charnames ':short';
print "\N{greek:Delta} is an upper-case delta.\n";

Δ is an upper-case delta.

use charnames qw(cyrillic greek);
print "\N{Sigma} and \N{sigma} are Greek sigmas.\n";
print "\N{Be} and \N{be} are Cyrillic bes.\n";

Σ and σ are Greek sigmas.
Б and б are Cyrillic bes.

Two functions, charnames::viacode and charnames::vianame, can translate between numeric code points and the long names. The Unicode documents use the notation U+XXXX to indicate the Unicode character whose code point is XXXX, so we'll use that here in our output.

use charnames qw(:full);
for $code (0xC4, 0x394) { 
    printf "Character U+%04X (%s) is named %s\n",
        $code, chr($code), charnames::viacode($code);
}

Character U+00C4 (Ä) is named LATIN CAPITAL LETTER A WITH DIAERESIS
Character U+0394 (Δ) is named GREEK CAPITAL LETTER DELTA

use charnames qw(:full);
$name = "MUSIC SHARP SIGN";
$code = charnames::vianame($name);
printf "%s is character U+%04X (%s)\n",
    $name, $code, chr($code); 

MUSIC SHARP SIGN is character U+266F (#)

Here's how to find the path to Perl's copy of the Unicode character database:

% perl -MConfig -le 'print "$Config{privlib}/unicore/NamesList.txt"'
/usr/local/lib/perl5/5.8.1/unicore/NamesList.txt

Read this file to learn the character names available to you.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.