home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam

# ## 6.5. Finding the N th Occurrence of a Match

### Problem

You want to find the N th match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of ``` "fish"``` :

``````

One fish two fish red fish blue fish

```
```

### Solution

Use the ``` /g``` modifier in a ``` while``` loop, keeping count of matches:

```\$WANT = 3;
\$count = 0;
while (/(\w+)\s+fish\b/gi) {
if (++\$count == \$WANT) {
print "The third fish is a \$1 one.\n";
# Warning: don't `last' out of this loop
}
}
```

The third fish is a red one.

```
```

Or use a repetition count and repeated pattern like this:

`/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;`

### Discussion

As explained in the chapter introduction, using the ``` /g``` modifier in scalar context creates something of a progressive match , useful in ``` while``` loops. This is commonly used to count the number of times a pattern matches in a string:

```# simple way with while loop
\$count = 0;
while (\$string =~ /PAT/g) {
\$count++;               # or whatever you'd like to do here
}

# same thing with trailing while
\$count = 0;
\$count++ while \$string =~ /PAT/g;

# or with for loop
for (\$count = 0; \$string =~ /PAT/g; \$count++) { }

# Similar, but this time count overlapping matches
\$count++ while \$string =~ /(?=PAT)/g;```

To find the N th match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every N th match by checking for multiples of N using the modulus operator. For example, ``` (++\$count``` ``` %``` ``` 3)``` ``` ==``` ``` 0``` would be every third match.

If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.

```\$pond  = 'One fish two fish red fish blue fish';

# using a temporary
@colors = (\$pond =~ /(\w+)\s+fish\b/gi);      # get all matches
\$color  = \$colors;                         # then the one we want

# or without a temporary array
\$color = ( \$pond =~ /(\w+)\s+fish\b/gi );  # just grab element 3

print "The third fish in the pond is \$color.\n";
```

The third fish in the pond is red.

```
```

Or finding all even-numbered fish:

```\$count = 0;
\$_ = 'One fish two fish red fish blue fish';
@evens = grep { \$count++ % 2 == 1 } /(\w+)\s+fish\b/gi;
print "Even numbered fish are @evens.\n";
```

Even numbered fish are two blue.

```
```

For substitution, the replacement value should be a code expression that returns the proper string. Make sure to return the original as a replacement string for the cases you aren't interested in changing. Here we fish out the fourth specimen and turn it into a snack:

```\$count = 0;
s{
\b               # makes next \w more efficient
( \w+ )          # this is what we'll be changing
(
\s+ fish \b
)
}{
if (++\$count == 4) {
"sushi" . \$2;
} else {
\$1   . \$2;
}
}gex;
```

One fish two fish red fish sushi fish

```
```

Picking out the last match instead of the first one is a fairly common task. The easiest way is to skip the beginning part greedily. After ``` /.*\b(\w+)\s+fish\b/``` , for example, the ``` \$1``` variable would have the last fish.

Another way to get arbitrary counts is to make a global match in list context to produce all hits, then extract the desired element of that list:

```\$pond = 'One fish two fish red fish blue fish swim here.';
\$color = ( \$pond =~ /\b(\w+)\s+fish\b/gi )[-1];
print "Last fish is \$color.\n";
```

Last fish is blue.

```
```

If you need to express this same notion of finding the last match in a single pattern without ``` /g``` , you can do so with the negative lookahead assertion ``` (?!THING)``` . When you want the last match of arbitrary pattern A, you find A followed by any amount of not A through the end of the string. The general construct is ``` A(?!.*A)*\$``` , which can be broken up for legibility:

```m{
A               # find some pattern A
(?!             # mustn't be able to find
.*          # something
A           # and A
)
\$               # through the end of the string
}x```

That leaves us with this approach for selecting the last fish:

```\$pond = 'One fish two fish red fish blue fish swim here.';
if (\$pond =~ m{
\b  (  \w+) \s+ fish \b
(?! .* \b fish \b )
}six )
{
print "Last fish is \$1.\n";
} else {
print "Failed!\n";
}
```

Last fish is blue.

```
```

This approach has the advantage that it can fit in just one pattern, which makes it suitable for similar situations as shown in Recipe 6.17 . It has its disadvantages, though. It's obviously much harder to read and understand, although once you learn the formula, it's not too bad. But it also runs more slowly though  - around twice as slowly on the data set tested ``` ``` above.

The behavior of ``` m//g``` in scalar context is given in the "Regexp Quote-like Operators" section of perlop (1), and in the "Pattern Matching Operators" section of Chapter 2 of Programming Perl ; zero-width positive lookahead assertions are shown in the "Regular Expressions" section of perlre (1), and in the "rules of regular expression matching" section of Chapter 2 of Programming Perl   6.4. Commenting Regular Expressions 6.6. Matching Multiple Lines