34.17 Searching for Patterns Split Across Lines[Article 27.11 introduced a script called cgrep.sed , a general-purpose, grep -like program built with sed . It allows you to look for one or more words that appear on one line or across several lines. This article explains the sed tricks that are necessary to do this kind of thing. It gets into territory that is essential for any advanced applications of this obscure yet wonderful editor. (Articles 34.13 through 34.16 have background information.) -JP] Let's review the two examples from article 27.11 . The first command below finds all lines containing the word system in the file main.c , and shows 10 additional lines of context above and below each match. The second command finds all occurrences of the word "awk" where it is followed by the word "perl" somewhere within the next 3 lines: cgrep -10 system main.c cgrep -3 "awk.*perl" Now the script, followed by an explanation of how it works:
The sed
script is embedded in a bare-bones
shell wrapper (44.14
)
to parse out the initial arguments because, unlike awk
and
perl
, sed
cannot directly access command-line parameters.
If the first argument looks like a -context
option, variable
n
is reset to one more than the number of lines specified, using
a little trick - the argument is treated as a negative number and
subtracted from So that the The sed
script itself looks rather unstructured (it was actually
designed using a flowchart), but the basic algorithm is easy enough
to understand.
We keep a "window" of n
lines in the pattern space
and scroll this window through the input stream.
If an occurrence of the pattern comes into the window, the entire
window is printed (providing n
lines of previous context), and
each subsequent line is printed until the pattern scrolls out of view
again (providing n
lines of following context).
The sed idiom The core of the script is basically an if-then-else construct
that decides if we are currently "in context."
(The regular expression here is delimited by tilde ( - |
|