34.18 Multiline DeleteThe sed delete command, d , deletes the contents of the pattern space ( 34.13 ) and causes a new line of input to be read, with editing resuming at the top of the script. The Delete command, D , works slightly differently: it deletes a portion of the pattern space, up to the first embedded newline. It does not cause a new line of input to be read; instead, it returns to the top of the script, applying these instructions to what remains in the pattern space. We can see the difference by writing a script that looks for a series of blank lines and outputs a single blank line. The version below uses the delete command:
# reduce multiple blank lines to one; version using d command /^$/{ N /^\n$/d }
When a blank line is encountered, the next line is appended
to the pattern space. Then we try to match the embedded
newline. Note that the positional metacharacters,
This line is followed by 1 blank line. This line is followed by 2 blank lines. This line is followed by 3 blank lines. This line is followed by 4 blank lines. This is the end. Running the script on the test file produces the following result:
% Where there was an even number of blank lines, all the blank lines were removed. Only when there was an odd number was a single blank line preserved. That is because the delete command clears the entire pattern space. Once the first blank line is encountered, the next line is read in, and both are deleted. If a third blank line is encountered, and the next line is not blank, the delete command is not applied, and thus a blank line is output. If we use the multiline Delete command, we get a different result, and the one that we wanted:
/^\n$/D The reason the multiline Delete command gets the job done is that when we encounter two blank lines, the Delete command removes only the first of the two. The next time through the script, the blank line will cause another line to be read into the pattern space. If that line is not blank, then both lines are output, thus ensuring that a single blank line will be output. In other words, when there are two blank lines in the pattern space, only the first is deleted. When a blank line is followed by text, the pattern space is output normally. - from O'Reilly & Associates' sed & awk , Chapter 6 |
|