6.5. Finding the N th Occurrence of a MatchProblem
You want to find the N
th
match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of
Solution
Use the
$WANT = 3;
$count = 0;
while (/(\w+)\s+fish\b/gi) {
if (++$count == $WANT) {
print "The third fish is a $1 one.\n";
# Warning: don't `last' out of this loop
}
}
Or use a repetition count and repeated pattern like this: /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i; Discussion
As explained in the chapter introduction, using the # simple way with while loop $count = 0; while ($string =~ /PAT/g) { $count++; # or whatever you'd like to do here } # same thing with trailing while $count = 0; $count++ while $string =~ /PAT/g; # or with for loop for ($count = 0; $string =~ /PAT/g; $count++) { } # Similar, but this time count overlapping matches $count++ while $string =~ /(?=PAT)/g;
To find the N
th
match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every N
th
match by checking for multiples of N using the modulus operator. For example, If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.
$pond = 'One fish two fish red fish blue fish';
# using a temporary
@colors = ($pond =~ /(\w+)\s+fish\b/gi); # get all matches
$color = $colors[2]; # then the one we want
# or without a temporary array
$color = ( $pond =~ /(\w+)\s+fish\b/gi )[2]; # just grab element 3
print "The third fish in the pond is $color.\n";
Or finding all even-numbered fish:
$count = 0;
$_ = 'One fish two fish red fish blue fish';
@evens = grep { $count++ % 2 == 1 } /(\w+)\s+fish\b/gi;
print "Even numbered fish are @evens.\n";
For substitution, the replacement value should be a code expression that returns the proper string. Make sure to return the original as a replacement string for the cases you aren't interested in changing. Here we fish out the fourth specimen and turn it into a snack:
$count = 0;
s{
\b # makes next \w more efficient
( \w+ ) # this is what we'll be changing
(
\s+ fish \b
)
}{
if (++$count == 4) {
"sushi" . $2;
} else {
$1 . $2;
}
}gex;
Picking out the last match instead of the first one is a fairly common task. The easiest way is to skip the beginning part greedily. After Another way to get arbitrary counts is to make a global match in list context to produce all hits, then extract the desired element of that list:
$pond = 'One fish two fish red fish blue fish swim here.';
$color = ( $pond =~ /\b(\w+)\s+fish\b/gi )[-1];
print "Last fish is $color.\n";
If you need to express this same notion of finding the last match in a single pattern without m{ A # find some pattern A (?! # mustn't be able to find .* # something A # and A ) $ # through the end of the string }x That leaves us with this approach for selecting the last fish:
$pond = 'One fish two fish red fish blue fish swim here.';
if ($pond =~ m{
\b ( \w+) \s+ fish \b
(?! .* \b fish \b )
}six )
{
print "Last fish is $1.\n";
} else {
print "Failed!\n";
}
This approach has the advantage that it can fit in just one pattern, which makes it suitable for similar situations as shown in
Recipe 6.17
. It has its disadvantages, though. It's obviously much harder to read and understand, although once you learn the formula, it's not too bad. But it also runs more slowly though - around twice as slowly on the data set tested See Also
The behavior of Copyright © 2001 O'Reilly & Associates. All rights reserved. |
|