1.5. Processing a String One Character at a TimeProblemYou want to process a string one character at a time. SolutionUse split with a null pattern to break up the string into individual characters, or use unpack if you just want their ASCII values: @array = split(//, $string); @array = unpack("C*", $string); Or extract each character in turn with a loop: while (/(.)/g) { # . is never a newline here # do something with $1 } DiscussionAs we said before, Perl's fundamental unit is the string, not the character. Needing to process anything a character at a time is rare. Usually some kind of higher-level Perl operation, like pattern matching, solves the problem more easily. See, for example, Recipe 7.7 , where a set of substitutions is used to find command-line arguments.
Splitting on a pattern that matches the empty string returns a list of the individual characters in the string. This is a convenient feature when done intentionally, but it's easy to do unintentionally. For instance,
Here's an example that prints the characters used in the string "
%seen = ();
$string = "an apple a day";
foreach $byte (split //, $string) {
$seen{$byte}++;
}
print "unique chars are: ", sort(keys %seen), "\n";
These
%seen = ();
$string = "an apple a day";
while ($string =~ /(.)/g) {
$seen{$1}++;
}
print "unique chars are: ", sort(keys %seen), "\n";
In general, if you find yourself doing character-by-character processing, there's probably a better way to go about it. Instead of using
The following example calculates the checksum of $sum = 0; foreach $ascval (unpack("C*", $string)) { $sum += $ascval; } print "sum is $sum\n"; # prints "1248" if $string was "an apple a day" This does the same thing, but much faster: $sum = unpack("%32C*", $string); This lets us emulate the SysV checksum program: #!/usr/bin/perl # sum - compute 16-bit checksum of all input files $checksum = 0; while (<>) { $checksum += unpack("%16C*", $_) } $checksum %= (2 ** 16) - 1; print "$checksum\n"; Here's an example of its use:
% perl sum /etc/termcap
If you have the GNU version of sum , you'll need to call it with the - -sysv option to get the same answer on the same file.
% sum --sysv /etc/termcap
Another tiny program that processes its input one character at a time is slowcat , shown in Example 1.1 . The idea here is to pause after each character is printed so you can scroll text before an audience slowly enough that they can read it. Example 1.1: slowcat#!/usr/bin/perl # slowcat - emulate a s l o w line printer # usage: slowcat [-DELAY] [files ...] $DELAY = ($ARGV[0] =~ /^-([.\d]+)/) ? (shift, $1) : 1; $| = 1; while (<>) { for (split(//)) { print; select(undef,undef,undef, 0.005 * $DELAY); } } See Also
The Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|