13.10. Compound SearchesYou may recall that you can search for lines containing "this" or "that" using the egrep (Section 13.4) | metacharacter: egrep 'this|that' files But how do you grep for "this" and "that"? Conventional regular expressions don't support an and operator because it breaks the rule of patterns matching one consecutive string of text. Well, agrep (Section 13.6) is one version of grep that breaks all the rules. If you're lucky enough to have it installed, just use this: agrep 'cat;dog;bird' files If you don't have agrep, a common technique is to filter the text through several greps so that only lines containing all the keywords make it through the pipeline intact: grep cat files | grep dog | grep bird But can it be done in one command? The closest you can come with grep is this idea: grep 'cat.*dog.*bird' files which has two limitations -- the words must appear in the given order, and they cannot overlap. (The first limitation can be overcome using egrep 'cat.*dog|dog.*cat', but this trick is not really scalable to more than two terms.) As usual, the problem can also be solved by moving beyond the grep family to the more powerful tools. Here is how to do a line-by-line and search using sed, awk, or perl:[44]
sed '/cat/!d; /dog/!d; /bird/!d' files awk '/cat/ && /dog/ && /bird/' files perl -ne 'print if /cat/ && /dog/ && /bird/' files Okay, but what if you want to find where all the words occur in the same paragraph? Just turn on paragraph mode by setting RS="" in awk or by giving the -00 option to perl: awk '/cat/ && /dog/ && /bird/ {print $0 ORS}' RS= files perl -n00e 'print "$_\n" if /cat/ && /dog/ && /bird/' files And if you just want a list of the files that contain all the words anywhere in them? Well, perl can easily slurp in entire files if you have the memory and you use the -0 option to set the record separator to something that won't occur in the file (like NUL): perl -ln0e 'print $ARGV if /cat/ && /dog/ && /bird/' files (Notice that as the problem gets harder, the less powerful commands drop out.) The grep filter technique shown earlier also works on this problem. Just add a -l option and the xargs command (Section 27.17) to make it pass filenames, rather than text lines, through the pipeline: grep -l cat files | xargs grep -l dog | xargs grep -l bird (xargs is basically the glue used when one program produces output needed by another program as command-line arguments.) -- GU Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|