8.4.1. Backreferences
A
backreference
refers back to a memory that was saved earlier in the current
pattern's processing. Backreferences are made with a
backslash, which is easy to
remember. For example, \1 contains the first
regular expression memory (that is, the part of the string matched by
the first pair of parentheses).
Backreferences are used to go back and match the exact same[182] string that was matched earlier in the pattern. So,
/(.)\1/ means to match any one character, remember
it as memory one, then match memory one again. In other words, match
any character, followed by the same character.
So, this pattern will match strings with doubled-letters, as in
bamm-bamm and betty. Of course,
the dot will match characters other than letters, so if a string has
two spaces in a row, two tabs in a row, or two asterisks in a row, it
will match.
That's not the same as the pattern /../,
which will match any character followed by any character -- those
two could be the same, or they could be different.
/(.)\1/ means to match any character followed by
the same character.
A typical usage of these memories might be if you have some
HTML-like[183] text to process. For
example, maybe you want to match a tag like these two, which may use
either single quotes or double quotes:
<image source='fred.png'>
<image source="fred's-birthday.png">
The tag may have either single quotes or double quotes, since the
quoted data may include the other kind of mark (as with the
apostrophe in the second example tag). So the pattern might look like
this: /<image source=(['"]).*\1>/. That says
that the opening quote mark may be of either type, but there must be
a matching mark at the end of the quote.[184]
If you have more sets of parentheses, you can have more
backreferences. As you might guess, \17 is the
contents of the seventeenth regular expression memory, if you have at
least that many sets of parentheses.[185]
In numbering backreferences, you can just count the left (opening)
parentheses. The pattern/((fred|wilma) (flintstone))
\1/ says to match strings like
fred flintstone fred
flintstone, since the first opening parenthesis and its
corresponding closing parenthesis hold a pattern that matches
fred flintstone.[186]
If we wrote /((fred|wilma) (flintstone)) \2/
instead, we would match strings like fred flintstone
fred; memory two is the choice of fred
or wilma. (Notice that it wouldn't match
fred flintsone wilma, since the backreference can
match only the same name that was matched earlier: either
fred or wilma. But it could
match wilma flintstone wilma, since that one uses
the same name.) And the pattern /((fred|wilma) (flintstone))
\3/ would match strings like fred flintstone
flintstone. It's uncommon to have a literal string
like flintstone in memory parentheses, though; we
did that one just to have a third example.