41.7. Perl Boot Camp, Part 4: Pattern MatchingPerl is excellent at finding patterns in text. It does this with regular expressions, similar to the ones used by grep and awk. Any scalar can be matched against a regular expression with the matching binding operator, =~. For example:
Without the matching binding operator, regular expressions match against the current value of $_. For example:
In this code, each line of input is examined for the character sequence quit. The /i modifier at the end of the regular expression makes the matching case-insensitive (i.e., Quit matches as well as qUIT). As with regular expressions in other utilities, Perl attempts to find the leftmost and longest match for your pattern against a given string. Patterns are made up of characters (which normally match themselves) and special metacharacters, including those found in Table 41-8. Table 41-8. Common Perl regular expression metacharacters
A very common task for which regular expressions are used is extracting specific information from a line of text. Suppose you wanted to get the first dotted quad that appears in this ifconfig command:
The output of a command can be captured into an array using the backtick operator. Each line of the command's output will be an element of the array. One way to extract the IP address from that line is with the following code:
This regular expression looks for one or more digits (\d+) followed by a literal dot (rather than the regular expression metacharacter), followed by two more digit/dot pairs, followed by one or more digits. If this pattern is found in the current line, the part that was matched is captured (thanks to the parentheses) into the special variable $1. You can capture more patterns in a regular expression with more parentheses. Each captured text appears in a sequential higher scalar (i.e., the next paren-captured match will be $2). Sometimes, you need to find all the matches for your pattern in a given string. This can be done with the /g regular expression modifier. If you wanted to find all the dotted quads in the ifconfig output, you could use the following code:
Here, the if block is replaced with a while loop. This is important for /g to work as expected. If the current line has something that looks like a dotted quad, that value is capture in $1, just as before. However, the /g modifier remembers where in the string it made the last match and looks after that point for another one. Perl's regular expression support has set the standard for other langauges. As such, it is impossible to give a comprehensive guide to Perl regular expressions here, but see O'Reilly's Mastering Regular Expressions or the perlre manpage.
Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|