8.4. Memory ParenthesesYou remember that parentheses ("( )") may be used for grouping together parts of a pattern. They also have a second function: they tell the regular expression engine to remember what was in the substring matched by the pattern in the parentheses. That is to say, it doesn't remember what was in the pattern itself; it remembers what was in the corresponding part of the string. Whenever you use parentheses for grouping, they automatically work as memory parentheses as well. So, if you use /./, you'll match any single character (except newline); if you use /(.)/, you'll still match any single character, but now it will be kept in a regular expression memory. For each pair of parentheses in the pattern, you'll have one regular expression memory. 8.4.1. BackreferencesA backreference refers back to a memory that was saved earlier in the current pattern's processing. Backreferences are made with a backslash, which is easy to remember. For example, \1 contains the first regular expression memory (that is, the part of the string matched by the first pair of parentheses). Backreferences are used to go back and match the exact same[182] string that was matched earlier in the pattern. So, /(.)\1/ means to match any one character, remember it as memory one, then match memory one again. In other words, match any character, followed by the same character. So, this pattern will match strings with doubled-letters, as in bamm-bamm and betty. Of course, the dot will match characters other than letters, so if a string has two spaces in a row, two tabs in a row, or two asterisks in a row, it will match.
That's not the same as the pattern /../, which will match any character followed by any character -- those two could be the same, or they could be different. /(.)\1/ means to match any character followed by the same character. A typical usage of these memories might be if you have some HTML-like[183] text to process. For example, maybe you want to match a tag like these two, which may use either single quotes or double quotes:
<image source='fred.png'> <image source="fred's-birthday.png"> The tag may have either single quotes or double quotes, since the quoted data may include the other kind of mark (as with the apostrophe in the second example tag). So the pattern might look like this: /<image source=(['"]).*\1>/. That says that the opening quote mark may be of either type, but there must be a matching mark at the end of the quote.[184]
If you have more sets of parentheses, you can have more backreferences. As you might guess, \17 is the contents of the seventeenth regular expression memory, if you have at least that many sets of parentheses.[185]
In numbering backreferences, you can just count the left (opening) parentheses. The pattern/((fred|wilma) (flintstone)) \1/ says to match strings like fred flintstone fred flintstone, since the first opening parenthesis and its corresponding closing parenthesis hold a pattern that matches fred flintstone.[186]
If we wrote /((fred|wilma) (flintstone)) \2/ instead, we would match strings like fred flintstone fred; memory two is the choice of fred or wilma. (Notice that it wouldn't match fred flintsone wilma, since the backreference can match only the same name that was matched earlier: either fred or wilma. But it could match wilma flintstone wilma, since that one uses the same name.) And the pattern /((fred|wilma) (flintstone)) \3/ would match strings like fred flintstone flintstone. It's uncommon to have a literal string like flintstone in memory parentheses, though; we did that one just to have a third example. 8.4.2. Memory VariablesWhen we get to the next chapter and back into the world of Perl, we'll see that the contents of these regular expression memories are available to us in special variables like $1 after the pattern match is done. We mention this here just so you'll know that the memories aren't merely used for backreferences; if you see what seem to be unnecessary parentheses in a pattern, they may actually be setting up those memories. Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|