If we were looking for all lines of a file that contain the string
abc
, we might use the Windows NT
findstr
command:
>findstr abc somefile > results
In this case,
abc
is the regular expression that the
findstr
command tests against each input line. Lines that match are sent to standard output, and end up in the file
results
because of the command-line redirection.
In Perl, we can speak of the string
abc
as a regular expression by enclosing the string in
slashes:
if (/abc/) {
print $_;
}
But what is being tested against the regular expression
abc
in this case? Why, it's our old friend, the
$_
variable! When a regular expression is enclosed in slashes (as above), the
$_
variable is tested against the regular expression. If the regular expression matches, the
match
operator returns
true. Otherwise, it returns false.
For this example, the
$_
variable is presumed to contain some text line and is printed if the line contains the characters
abc
in sequence anywhere within the line - similar to the
findstr
command above. Unlike the
findstr
command, which is operating on all of the lines of a file, this Perl fragment is looking at just one line. To work on all lines, add a loop, as in:
while (
<>) {
if (/abc/) {
print $_;
}
}
What if we didn't know the number of
b
's between the
a
and the
c
? That is, what if we want to print the line if it contains an
a
followed by zero or more
b
's, followed by a
c
? With
findstr
, we'd say:
>findstr ab*c somefile >results
In Perl, we can say exactly the same thing:
while (<>) {
if (/ab*c/) {
print $_;
}
}
Just like
findstr
, this loop looks for an
a
followed by zero or more
b
's followed by a
c
.
We'll visit more uses of pattern matching in the section
"More on the Matching Operator
," later in the chapter, after we talk about all kinds of regular expressions.
Another simple regular expression operator is the
substitute
operator, which replaces the part of a string that matches the regular expression with another string. The substitute operator consists of the letter
s
, a slash, a regular expression, a slash, a replacement string, and a final slash, looking something like:
s/ab*c/def/;
The variable (in this case,
$_
) is matched against the regular expression (
ab*c
). If the match is successful, the part of the string that matched is discarded and replaced by the replacement string (
def
). If the match is unsuccessful, nothing happens.
As with the match operator, we'll revisit the myriad options on the substitute operator later, in the section
"Substitutions
."