An essential element of this program is that, like
grep, it prints out only the lines that match the
pattern. You might think we'd use the -n option to
suppress the default output of lines. However, what is unusual about
this sed script is that it creates an input/output loop, controlling
when a line is output or not.
The logic of this script is to first look for the pattern on one line
and print the line if it matches. If no match is found, we read
another line into the pattern space (as in previous multiline
scripts). Then we copy the two-line pattern space to the hold space
for safekeeping. Now the new line that was read into the pattern
space previously could match the search pattern on its own, so the
next match we attempt is on the second line only. Once we've
determined that the pattern is not found on either the first or second
lines, we remove the newline between the two lines and look for it
spanning those lines.
The script is designed to accept arguments from the command line. The
first argument is the search pattern. All other command-line
arguments will be interpreted as filenames. Let's look at the entire
script before analyzing it:
#! /bin/sh
# phrase -- search for words across lines
# $1 = search string; remaining args = filenames
search=$1
shift
for file
do
sed '
/'"$search"'/b
N
h
s/.*\n//
/'"$search"'/b
g
s/ *\n/ /
/'"$search"'/{
g
b
}
g
D' $file
done
The sed script tries to match the search string at three different
points, each marked by the address that looks for the search pattern.
The first line of the script looks for the search pattern on a line by
itself:
/'"$search"'/b
If the search pattern matches the line, the branch command, without a
label, transfers control to the bottom of the script where the line is
printed. This makes use of sed's normal control-flow so that the next
input line is read into the pattern space and control then returns to
the top of the script. The branch command is used in the same way
each time we try to match the pattern.
If a single input line does not match the pattern, we begin our next
procedure to create a multiline pattern space. It is possible that
the new line, by itself, will match the search string. It may not be
apparent why this step is necessary--why not just immediately
look for the pattern anywhere across two lines? The reason is that if
the pattern is actually matched on the second line, we'd still output
the pair of lines. In other words, the user would see the line
preceding the matched line and might be confused by it. This way we
output the second line by itself if that is what matches the pattern.
N
h
s/.*\n//
/'"$search"'/b
The Next command appends the next input line to the pattern space.
The hold command places a copy of the two-line pattern space into the
hold space. The next action will change the pattern space and we want
to preserve the original intact. Before looking for the pattern, we
use the substitute command to remove the previous line, up to and
including the embedded newline. There are several reasons for doing
it this way and not another way, so let's consider some of the
alternatives. You could write a pattern that matches the search
pattern only if it occurs after the embedded newline:
/\n.*'"$search"'/b
However, if a match is found, we don't want to print the entire
pattern space, just the second portion of it. Using the above
construct would print both lines when only the second line matches.
You might want to use the Delete command to remove the first line in
the pattern space before trying to match the pattern. A side effect
of the Delete command is a change in flow control that would resume
execution at the top of the script. (The Delete command could
conceivably be used but not without changing the logic of this
script.)
So, we try to match the pattern on the second line, and if that is
unsuccessful, then we try to match it across two lines:
g
s/ *\n/ /
/'"$search"'/{
g
b
}
The get command retrieves a copy of the original two-line pair from the hold
space, overwriting the line we had worked with in the pattern space.
The substitute command replaces the embedded newline and any spaces
preceding it with a single space. Then we attempt to match the
pattern. If the match is made, we don't want to print the contents of
the pattern space, but rather get the duplicate from the hold space
(which preserves the newline) and print it. Thus, before branching to
the end of the script, the get command retrieves the copy from the
hold space.
The last part of the script is executed only if the pattern has not
been matched.
g
D
The get command retrieves the duplicate, that preserves the newline,
from the hold space. The Delete command removes the first line in the
pattern space and passes control back to the top of the script. We
delete only the first part of the pattern space, instead of clearing
it, because after reading another input line, it is possible to match
the pattern spanning across both lines.
Here's the result when the program is run on a sample file:
$ phrase "the procedure is followed" sect3
If a pattern is followed by a \f(CW!\fP, then the procedure
is followed for all lines that do not match the pattern.
so that the procedure is followed only if there is no match.