Why are there two different ways to refer to that same string?
They're not really referring to the same string at the same
time; $4 means the fourth memory of an
already completed pattern
match, while \4 is a backreference referring back
to the fourth memory of the currently
matching regular expression. Besides,
backreferences work inside regular expressions only; once we're
back in the world of Perl, we'll use $4.
These match variables are a big part of the power of regular
expressions, because they let us pull out the parts of a string:
$_ = "Hello there, neighbor";
if (/\s(\w+),/) { # memorize the word between space and comma
print "the word was $1\n"; # the word was there
}
Or you could use more than one memory at once:
$_ = "Hello there, neighbor";
if (/(\S+) (\S+), (\S+)/) {
print "words were $1 $2 $3\n";
}
That tells us that the words were Hello there
neighbor. Notice that there's no comma in the output
(because the comma is outside of the memory parentheses). That leaves
the comma out of memory two. Using this technique, we can choose
exactly what we want in the memories, as well as what we want to
leave out.
9.5.2. The Automatic Match Variables
There are three more match variables that you get for free,[201] whether
the pattern has memory parentheses or not. That's the good
news; the bad news is that these variables have weird names.
Now, Larry probably would have been happy enough to call these by
slightly-less-weird names, like perhaps $gazoo or
$ozmodiar. But those are names that you just might
want to use in your own code. To keep ordinary Perl programmers from
having to memorize the names of all of
Perl's special variables before choosing their first variable
names in their first programs,[202] Larry has given strange names to many of Perl's
builtin variables, names that
"break the rules." In this case, the names are
punctuation marks: $&, $`,
and $'. They're strange, ugly, and weird,
but those are their names.[203]
The part of the string that actually matched the pattern is
automatically stored in
$&:
if ("Hello there, neighbor" =~ /\s(\w+),/) {
print "That actually matched '$&'.\n";
}
That tells us that the part that matched was "
there," (with a space, a word, and a comma). Memory one, in
$1, has just the five-letter word
there, but $& has the
entire matched section.
Whatever came before the matched section is in
$`, and whatever was after it is in
$'. Another way to say that is that
$` holds whatever the regular expression engine
had to skip over before it found the match, and $'
has the remainder of the string that the pattern never got to. If you
glue these three strings together in order, you'll always get
back the original string:
if ("Hello there, neighbor" =~ /\s(\w+),/) {
print "That was ($`)($&)($').\n";
}
The message shows the string as (Hello)( there,)(
neighbor), showing the three automatic match variables in
action. This may seem familiar, and for good reason: These automatic
memory variables are what the pattern test program (from Chapter 7, "Concepts of Regular Expressions") was using in its line of
"mystery" code, to show what part of the string was being
matched by the pattern:
print "Matched: |$`<$&>$'|\n"; # The three automatic match variables
Any or all of these three automatic match variables may be empty, of
course, just like the numbered match variables. And they have the
same scope as the numbered match variables. Generally, that means
that they'll stay around until the next successful pattern
match.
Now, we said earlier that these three are "free." Well,
freedom has its price. In this case, the price is that once you use
any one of these automatic match variables anywhere in your entire
program, other regular expressions will run a little more slowly.
Now, this isn't a giant slowdown, but it's enough of a
worry that many Perl programmers will simply never use these
automatic match
variables.[204] Instead, they'll use
a workaround. For example, if the only one you need is
$&, just put parentheses around the whole
pattern and use $1 instead (you may need to
renumber the pattern's memories, of course).
Match variables (both the automatic ones and the numbered ones) are
most often used in substitutions, which are the topic of the next
section.