1.4. Converting Between Characters and Values1.4.1. ProblemYou want to print the number represented by a given character, or you want to print a character given a number. 1.4.2. SolutionUse ord to convert a character to a number, or use chr to convert a number to its corresponding character: $num = ord($char); $char = chr($num); The %c format used in printf and sprintf also converts a number to a character: $char = sprintf("%c", $num); # slower than chr($num) printf("Number %d is character %c\n", $num, $num); Number 101 is character e A C* template used with pack and unpack can quickly convert many 8-bit bytes; similarly, use U* for Unicode characters. @bytes = unpack("C*", $string); $string = pack("C*", @bytes); $unistr = pack("U4",0x24b6,0x24b7,0x24b8,0x24b9); @unichars = unpack("U*", $unistr); 1.4.3. DiscussionUnlike low-level, typeless languages such as assembler, Perl doesn't treat characters and numbers interchangeably; it treats strings and numbers interchangeably. That means you can't just assign characters and numbers back and forth. Perl provides Pascal's chr and ord to convert between a character and its corresponding ordinal value: $value = ord("e"); # now 101 $character = chr(101); # now "e" If you already have a character, it's really represented as a string of length one, so just print it out directly using print or the %s format in printf and sprintf. The %c format forces printf or sprintf to convert a number into a character; it's not used for printing a character that's already in character format (that is, a string). printf("Number %d is character %c\n", 101, 101); The pack, unpack, chr, and ord functions are all faster than sprintf. Here are pack and unpack in action: @ascii_character_numbers = unpack("C*", "sample"); print "@ascii_character_numbers\n"; 115 97 109 112 108 101 $word = pack("C*", @ascii_character_numbers); $word = pack("C*", 115, 97, 109, 112, 108, 101); # same print "$word\n"; sample Here's how to convert from HAL to IBM: $hal = "HAL"; @byte = unpack("C*", $hal); foreach $val (@byte) { $val++; # add one to each byte value } $ibm = pack("C*", @byte); print "$ibm\n"; # prints "IBM" On single-byte character data, such as plain old ASCII or any of the various ISO 8859 charsets, the ord function returns numbers from 0 to 255. These correspond to C's unsigned char data type. However, Perl understands more than that: it also has integrated support for Unicode, the universal character encoding. If you pass chr, sprintf "%c", or pack "U*" numeric values greater than 255, the return result will be a Unicode string. Here are similar operations with Unicode: @unicode_points = unpack("U*", "fac\x{0327}ade"); print "@unicode_points\n"; 102 97 99 807 97 100 101 $word = pack("U*", @unicode_points); print "$word\n"; façade If all you're doing is printing out the characters' values, you probably don't even need to use unpack. Perl's printf and sprintf functions understand a v modifier that works like this: printf "%vd\n", "fac\x{0327}ade"; 102.97.99.807.97.100.101 printf "%vx\n", "fac\x{0327}ade"; 66.61.63.327.61.64.65 The numeric value of each character (that is, its "code point" in Unicode parlance) in the string is emitted with a dot separator. 1.4.4. See AlsoThe chr, ord, printf, sprintf, pack, and unpack functions in perlfunc(1) and Chapter 29 of Programming Perl Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|