If we were looking for all lines of a file that contain the string
abc
, we might use the
grep
command:
grep abc somefile >results
In this case,
abc
is the regular expression that the
grep
command tests against each input line. Lines that match are sent to standard output, here ending up in the file
results
because of the command-line redirection.
In Perl, we can speak of the string
abc
as a regular expression by enclosing the string in
slashes:
if (/abc/) {
print $_;
}
But what is being tested against the regular expression
abc
in this case? Why, it's our old friend, the
$_
variable! When a regular expression is enclosed in slashes (as above), the
$_
variable is tested against the regular expression. If the regular expression matches, the
match
operator returns
true. Otherwise, it returns false.
For this example, the
$_
variable is presumed to contain some text line and is printed if the line contains the characters
abc
in sequence anywhere within the line - similar to the
grep
command above. Unlike the
grep
command, which is operating on all of the lines of a file, this Perl fragment is looking at just one line. To work on all lines, add a loop, as in:
while (
<>) {
if (/abc/) {
print $_;
}
}
What if we didn't know the number of
b
's between the
a
and the
c
? That is, what if we want to print the line if it contains an
a
followed by zero or more
b
's, followed by a
c
. With
grep
, we'd say:
grep "ab*c" somefile >results
(The argument containing the asterisk is in quotes because we don't want the shell expanding that argument as if it were a filename wildcard. It has to be passed as-is to
grep
to be effective.) In Perl, we can say exactly the same thing:
while (<>) {
if (/ab*c/) {
print $_;
}
}
Just like
grep
, this means an
a
followed by zero or more
b
's followed by a
c
.
We'll visit more uses of pattern matching in
Section 7.4, "More on the Matching Operator
," later in the chapter, after we talk about all kinds of regular expressions.
Another simple regular expression operator is the
substitute
operator, which replaces the part of a string that matches the regular expression with another string. The substitute operator looks like the
s
command in the UNIX command
sed
utility, consisting of the letter
s
, a slash, a regular expression, a slash, a replacement string, and a final slash, looking something like:
s/ab*c/def/;
The variable (in this case,
$_
) is matched against the regular expression (
ab*c
). If the match is successful, the part of the string that matched is discarded and replaced by the replacement string (
def
). If the match is unsuccessful, nothing happens.
As with the match operator, we'll revisit the myriad options on the substitute operator later, in
Section 7.5, "Substitutions
."