6.17. Expressing AND, OR, and NOT in a Single PatternProblemYou have an existing program that accepts a pattern as an argument or input. It doesn't allow you to add extra logic, like case insensitive options, ANDs, or NOTs. So you need to write a single pattern that matches either of two different patterns (the "or" case), both of two patterns (the "and" case), or that reverses the sense of the match ("not"). This situation arises often in a configuration files, web forms, or command-line arguments. Imagine there's a program that does this: chomp($pattern = <CONFIG_FH>); if ( $data =~ /$pattern/ ) { ..... } As the one maintaining the contents of CONFIG_FH, you need to convey Booleans through to the matching program through that one, measly pattern without explicit connectives. Solution
True if either /ALPHA|BETA/
True if both /^(?=.*ALPHA)(?=.*BETA)/s
True if both /ALPHA.*BETA|BETA.*ALPHA/s
True if pattern /^(?:(?!PAT).)*$/s
True if pattern /(?=^(?:(?!BAD).)*$)GOOD/s DiscussionWhen you're writing a regular program and want to know if something doesn't match, say one of: if (!($string =~ /pattern/)) { something() } # ugly if ( $string !~ /pattern/) { something() } # preferred If you want to see if two patterns both match, use:
if ($string =~ /pat1/ && $string =~ /pat2/ ) {
If you want to see if either of two patterns matches:
if ($string =~ /pat1/ || $string =~ /pat2/ ) {
In short, use Perl's normal Boolean connectives to combine regular expressions, rather than doing it all within a single pattern. However, imagine a minigrep program, one that reads its single pattern as an argument, as shown in Example 6.12 . Example 6.12: minigrep#!/usr/bin/perl # minigrep - trivial grep $pat = shift; while (<>) { print if /$pat/o; } If you want to tell minigrep that some pattern must not match, or that it has to match both of two subpatterns in any order, then you're at an impasse. The program isn't built to accept those constructs. How can you do it using one pattern? That is, you'd like to execute the minigrep PAT program where PAT can't match or has more than one connected patterns in it. This need comes up often in program reading patterns from configuration files.
The OR case is pretty easy, since the
For AND, you have to distinguish between overlapping and non-overlapping cases. You want to see if a string matches both "labelled" =~ /^(?=.*bell)(?=.*lab)/s Remember: in a normal program, you don't have to go through these contortions. You can simply say: $string =~ /bell/ && $string =~ /lab/
To unravel this, we'll spell it out using if ($murray_hill =~ m{ ^ # start of string (?= # zero-width lookahead .* # any amount of intervening stuff bell # the desired bell string ) # rewind, since we were only looking (?= # and do the same thing .* # any amount of intervening stuff lab # and the lab part ) }sx ) # /s means . can match newline { print "Looks like Bell Labs might be in Murray Hill!\n"; }
We didn't use To handle the non-overlapping case, you need two parts separated by an OR. The first branch is THIS followed by THAT; the second is the other way around. "labelled" =~ /(?:^.*bell.*lab)|(?:^.*lab.*bell)/ or in long form: $brand = "labelled"; if ($brand =~ m{ (?: # non-capturing grouper ^ .*? # any amount of stuff at the front bell # look for a bell .*? # followed by any amount of anything lab # look for a lab ) # end grouper | # otherwise, try the other direction (?: # non-capturing grouper ^ .*? # any amount of stuff at the front lab # look for a lab .*? # followed by any amount of anything bell # followed by a bell ) # end grouper }sx ) # /s means . can match newline { print "Our brand has bell and lab separate.\n"; }
These patterns aren't necessarily faster. If you followed those two, then the NOT case should be a breeze. The general form looks like this: $map =~ /^(?:(?!waldo).)*$/s Spelled out in long form, this yields: if ($map =~ m{ ^ # start of string (?: # non-capturing grouper (?! # look ahead negation waldo # is he ahead of us now? ) # is so, the negation failed . # any character (cuzza /s) ) * # repeat that grouping 0 or more $ # through the end of the string }sx ) # /s means . can match newline { print "There's no waldo here!\n"; } How would you combine AND, OR, and NOT? It's not a pretty picture, and in a regular program, you'd almost never do this, but from a config file or command line where you only get to specify one pattern, you have no choice. You just have to combine what we've learned so far. Carefully.
Let's say you wanted to run the Unix
w
program and find out whether user Here's sample input from w on my Linux system:
Here's how to do that using the minigrep program outlined previously or with the tcgrep program at the end of this chapter: % w | minigrep '^(?!.*ttyp).*tchrist' Decoding that pattern: m{ ^ # anchored to the start (?! # zero-width look-ahead assertion .* # any amount of anything (faster than .*?) ttyp # the string you don't want to find ) # end look-ahead negation; rewind to start .* # any amount of anything (faster than .*?) tchrist # now try to find Tom }x Never mind that any sane person would just call grep twice, once with a -v option to select only non-matches. % w | grep tchrist | grep -v ttyp The point is that Boolean conjunctions and negations can be coded up in one single pattern. You should comment this kind of thing, though, having pity on those who come after you - before they do.
How would you embed that % grep -i 'pattern' files % minigrep '(?i)pattern' files See AlsoLookahead assertions are shown in the "Regular Expressions" section of perlre (1), and in the "rules of regular expression matching" section of Chapter 2 of Programming Perl ; your system's grep (1) and w (1) manpages; we talk about configuration files in Recipe 8.16 |
|