41.7. Perl Boot Camp, Part 4: Pattern Matching
Perl is
excellent at finding patterns in text.
It does this with regular expressions, similar to the ones used by
grep and awk. Any scalar can be
matched against a regular expression with the matching
binding operator, =~. For example:
if( $user =~ /jjohn/ ){
print "I know you";
}
Without the matching binding operator, regular expressions match
against the current value of $_. For example:
while (<>) {
if (/quit/i) {
print "Looks like you want out.\n";
last;
}
}
In this code, each line of input is examined for the character
sequence quit. The /i modifier
at the end of the regular expression makes the matching
case-insensitive (i.e., Quit matches as well as
qUIT).
As with regular expressions in other utilities, Perl attempts to find
the leftmost and longest match for your pattern against a given
string. Patterns are made up of characters (which normally match
themselves) and special metacharacters, including those found in
Table 41-8.
Table 41-8. Common Perl regular expression metacharacters
Operator
|
Description
|
^
|
Pattern must match at the beginning of the line.
|
$
|
Pattern must match at the end of the line.
|
.
|
Match any character (expect the newline).
|
pat1|pat2
|
Alternation: match the pattern on either the left or right.
|
(pattern)
|
Group this pattern together as one (good for quantifiers and
capturing).
|
[ synbols]
|
Define a new character class: any of the symbols given can match one
character of input (e.g. /[aeiou]/ matches a
string with at least one regular vowel).
|
\w
|
Match a letter, number and underscore.
|
\d
|
Match a number.
|
\s
|
Match a whitespace character: space, tab, \n, \r.
|
pattern*
|
Match 0 or more consecutive occurences of
pattern.
|
pattern+
|
Match 1 or more consecutive occurrences of
pattern.
|
pattern?
|
Optionally match pattern.
|
A very common task for which regular expressions are used is
extracting specific information from a line of text. Suppose you
wanted to get the first dotted quad that appears in this
ifconfig command:
$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:A0:76:C0:1A:E1
inet addr:192.168.1.50 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:365079 errors:0 dropped:0 overruns:0 frame:0
TX packets:426050 errors:0 dropped:0 overruns:0 carrier:0
collisions:3844 txqueuelen:100
Interrupt:9 Base address:0x300
The output of a
command can be captured into an array using the backtick operator.
Each line of the command's output will be an element
of the array. One way to extract the IP address from that line is
with the following code:
my @ifconfig = `/sbin/ifconfig eth0`;
for (@ifconfig) {
if ( /(\d+\.\d+\.\d+\.\d+)/ ) {
print "Quad: $1\n";
last;
}
}
This
regular expression looks for one or more digits
(\d+) followed by a literal dot (rather than the
regular expression metacharacter), followed by two more digit/dot
pairs, followed by one or more digits. If this pattern is found in
the current line, the part that was matched is captured (thanks to
the parentheses) into the special variable $1. You
can capture more patterns in a regular expression with more
parentheses. Each captured text appears in a sequential higher scalar
(i.e., the next paren-captured match will be $2).
Sometimes, you need to find all the matches for your pattern in a
given string. This can be done with the
/g regular expression modifier. If you
wanted to find all the dotted quads in the
ifconfig output, you could use the following code:
my @ifconfig = `/sbin/ifconfig eth0`;
for (@ifconfig) {
while( /(\d+\.\d+\.\d+\.\d+)/g ){
print "Quad: $1\n";
}
}
Here, the if block is replaced with a
while loop. This is important for
/g to work as expected. If the current line has
something that looks like a dotted quad, that value is capture in
$1, just as before. However, the
/g modifier remembers where in the string it made
the last match and looks after that point for another one.
Perl's regular expression support has set the
standard for other langauges. As such, it is impossible to give a
comprehensive guide to Perl regular expressions here, but see
O'Reilly's Mastering
Regular Expressions or the perlre
manpage.
-- JJ
 |  |  | 41.6. Perl Boot Camp, Part 3: Branching and Looping |  | 41.8. Perl Boot Camp, Part 5: Perl Knows Unix |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|