13.6. Approximate grep: agrep
agrep
is one of the nicer additions to the grep family.
It's not only one of the faster greps around; it
also has the unique feature of looking for approximate matches.
It's also record oriented rather than line oriented.
The three most significant features of agrep that
are not supported by the grep family are as
follows:
-
The ability to search for approximate patterns, with a user-definable
level of accuracy. For example:
% agrep -2 homogenos foo
will find "homogeneous," as well as
any other word that can be obtained from
"homogenos" with at most two
substitutions, insertions, or deletions.
% agrep -B homogenos foo
will generate a message of the form:
best match has 2 errors, there are 5 matches, output them? (y/n)
-
agrep is record oriented rather than just line
oriented; a record is by default a line, but it can be user-defined
with the
-d option specifying a pattern that will be used as
a record delimiter. For example:
% agrep -d '^From ' 'pizza' mbox
outputs all mail messages (Section 1.21) (delimited by a line beginning with
From and a space) in the file
mbox that contain the keyword
pizza. Another example:
% agrep -d '$$' pattern foo
will output all paragraphs (separated by an empty line) that contain
pattern.
-
agrep allows multiple patterns with
AND (or OR)
logic queries. For example:
% agrep -d '^From ' 'burger,pizza' mbox
outputs all mail messages containing at least one of the two keywords
(, stands for OR).
% agrep -d '^From ' 'good;pizza' mbox
outputs all mail messages containing both keywords.
Putting these options together, one can write queries such as the
following:
% agrep -d '$$' -2 '<CACM>;TheAuthor;Curriculum;<198[5-9]>' bib
which outputs all paragraphs referencing articles in CACM between
1985 and 1989 by TheAuthor dealing with
Curriculum. Two errors are allowed, but they cannot be in either CACM
or the year. (The < > brackets forbid errors in the pattern
between them.)
Other agrep features include searching for
regular expressions (with or without
errors), using unlimited wildcards, limiting the errors to only
insertions or only substitutions or any combination, allowing each
deletion, for example, to be counted as two substitutions or three
insertions, restricting parts of the query to be exact and parts to
be approximate, and many more.
--JP, SW, and UM
| | | 13.5. grepping for a List of Patterns | | 13.7. Search RCS Files with rcsgrep |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|
|