6.17. Matching Nested Patterns6.17.1. ProblemYou want to match a nested set of enclosing delimiters, such as the arguments to a function call. 6.17.2. SolutionUse match-time pattern interpolation, recursively: my $np; $np = qr{ \( (?: (?> [^( )]+ ) # Non-capture group w/o backtracking | (??{ $np }) # Group with matching parens )* \) }x; Or use the Text::Balanced module's extract_bracketed function. 6.17.3. DiscussionThe $(??{ CODE }) construct runs the code and interpolates the string that the code returns right back into the pattern. A simple, non-recursive example that matches palindromes demonstrates this: if ($word =~ /^(\w+)\w?(??{reverse $1})$/ ) { print "$word is a palindrome.\n"; } Consider a word like "reviver", which this pattern correctly reports as a palindrome. The $1 variable contains "rev" partway through the match. The optional word character following catches the "i". Then the code reverse $1 runs and produces "ver", and that result is interpolated into the pattern. For matching something balanced, you need to recurse, which is a bit tricker. A compiled pattern that uses (??{ CODE }) can refer to itself. The pattern given in the Solution matches a set of nested parentheses, however deep they may go. Given the value of $np in that pattern, you could use it like this to match a function call: $text = "myfunfun(1,(2*(3+4)),5)"; $funpat = qr/\w+$np/; # $np as above $text =~ /^$funpat$/; # Matches! You'll find many CPAN modules that help with matching (parsing) nested strings. The Regexp::Common module supplies canned patterns that match many of the tricker strings. For example: use Regexp::Common; $text = "myfunfun(1,(2*(3+4)),5)"; if ($text =~ /(\w+\s*$RE{balanced}{-parens=>'( )'})/o) { print "Got function call: $1\n"; } Other patterns provided by that module match numbers in various notations and quote-delimited strings: $RE{num}{int} $RE{num}{real} $RE{num}{real}{'-base=2'}{'-sep=,'}{'-group=3'} $RE{quoted} $RE{delimited}{-delim=>'/'} The standard (as of v5.8) Text::Balanced module provides a general solution to this problem. use Text::Balanced qw/extract_bracketed/; $text = "myfunfun(1,(2*(3+4)),5)"; if (($before, $found, $after) = extract_bracketed($text, "(")) { print "answer is $found\n"; } else { print "FAILED\n"; } 6.17.4. See AlsoThe section on "Match-time pattern interpolation" in Chapter 5 of Programming Perl; the documentation for the Regexp::Common CPAN module and the standard Text::Balanced module Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|