1.1.3. Discussion
Strings are a basic data type; they aren't arrays of a basic data
type. Instead of using array subscripting to access individual
characters as you sometimes do in other programming languages, in
Perl you use functions like unpack or
substr to access individual characters or a
portion of the string.
The offset argument to substr indicates the start
of the substring you're interested in, counting from the front if
positive and from the end if negative. If the offset is 0, the
substring starts at the beginning. The count argument is the length
of the substring.
$string = "This is what you have";
# +012345678901234567890 Indexing forwards (left to right)
# 109876543210987654321- Indexing backwards (right to left)
# note that 0 means 10 or 20, etc. above
$first = substr($string, 0, 1); # "T"
$start = substr($string, 5, 2); # "is"
$rest = substr($string, 13); # "you have"
$last = substr($string, -1); # "e"
$end = substr($string, -4); # "have"
$piece = substr($string, -8, 3); # "you"
# you can test substrings with =~
if (substr($string, -10) =~ /pattern/) {
print "Pattern matches in last 10 characters\n";
}
# substitute "at" for "is", restricted to first five characters
substr($string, 0, 5) =~ s/is/at/g;
You can even swap values by using several substr s
on each side of an assignment:
# exchange the first and last letters in a string
$a = "make a hat";
(substr($a,0,1), substr($a,-1)) =
(substr($a,-1), substr($a,0,1));
print $a;
take a ham
Although unpack is not lvaluable, it is
considerably faster than substr when you extract
numerous values all at once. Specify a format describing the layout
of the record to unpack. For positioning, use lowercase
"x" with a count to skip forward some number of
bytes, an uppercase "X" with a count to skip
backward some number of bytes, and an "@" to skip
to an absolute byte offset within the record. (If the data contains
Unicode strings, be careful with those three: they're strictly
byte-oriented, and moving around by bytes within multibyte data is
perilous at best.)
# extract column with unpack
$a = "To be or not to be";
$b = unpack("x6 A6", $a); # skip 6, grab 6
print $b;
or not
($b, $c) = unpack("x6 A2 X5 A2", $a); # forward 6, grab 2; backward 5, grab 2
print "$b\n$c\n";
or
be
sub cut2fmt {
my(@positions) = @_;
my $template = '';
my $lastpos = 1;
foreach $place (@positions) {
$template .= "A" . ($place - $lastpos) . " ";
$lastpos = $place;
}
$template .= "A*";
return $template;
}
$fmt = cut2fmt(8, 14, 20, 26, 30);
print "$fmt\n";
A7 A6 A6 A6 A4 A*
The powerful unpack function goes far beyond mere
text processing. It's the gateway between text and binary data.
In this recipe, we've assumed that all character data is 7- or 8-bit
data so that pack's byte operations work as
expected.