home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  

Perl CookbookPerl CookbookSearch this book

1.4. Converting Between Characters and Values

1.4.3. Discussion

Unlike low-level, typeless languages such as assembler, Perl doesn't treat characters and numbers interchangeably; it treats strings and numbers interchangeably. That means you can't just assign characters and numbers back and forth. Perl provides Pascal's chr and ord to convert between a character and its corresponding ordinal value:

$value     = ord("e");    # now 101
$character = chr(101);    # now "e"

If you already have a character, it's really represented as a string of length one, so just print it out directly using print or the %s format in printf and sprintf. The %c format forces printf or sprintf to convert a number into a character; it's not used for printing a character that's already in character format (that is, a string).

printf("Number %d is character %c\n", 101, 101);

The pack, unpack, chr, and ord functions are all faster than sprintf. Here are pack and unpack in action:

@ascii_character_numbers = unpack("C*", "sample");
print "@ascii_character_numbers\n";
115 97 109 112 108 101

$word = pack("C*", @ascii_character_numbers);
$word = pack("C*", 115, 97, 109, 112, 108, 101);   # same
print "$word\n";

Here's how to convert from HAL to IBM:

$hal = "HAL";
@byte = unpack("C*", $hal);
foreach $val (@byte) {
    $val++;                 # add one to each byte value
$ibm = pack("C*", @byte);
print "$ibm\n";             # prints "IBM"

On single-byte character data, such as plain old ASCII or any of the various ISO 8859 charsets, the ord function returns numbers from 0 to 255. These correspond to C's unsigned char data type.

However, Perl understands more than that: it also has integrated support for Unicode, the universal character encoding. If you pass chr, sprintf "%c", or pack "U*" numeric values greater than 255, the return result will be a Unicode string.

Here are similar operations with Unicode:

@unicode_points = unpack("U*", "fac\x{0327}ade");
print "@unicode_points\n";
102 97 99 807 97 100 101

$word = pack("U*", @unicode_points);
print "$word\n";

If all you're doing is printing out the characters' values, you probably don't even need to use unpack. Perl's printf and sprintf functions understand a v modifier that works like this:

printf "%vd\n", "fac\x{0327}ade";

printf "%vx\n", "fac\x{0327}ade";

The numeric value of each character (that is, its "code point" in Unicode parlance) in the string is emitted with a dot separator.

Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.