Hold That Line (sed & awk, Second Edition)

Command

Abbreviation

Function

Hold

h or H

Copy or append contents of pattern space to hold space.

Get

g or G

Copy or append contents of hold space to pattern space.

Exchange

Swap contents of hold space and pattern space.

#! /bin/sh # index.edit -- compile list of index entries for editing # new version that matches metacharacters grep "^\.XX" $* | sort -u | sed ' h s/[][\\*.]/\\&/g x s/[\\&]/\\&/g s/^\.XX // s/$/\// x s/^\\\.XX $.*$$/\/^\\.XX \/s\/\1\// G s/\n//'

6.3.3. Building Blocks of Text

The hold space can be used to collect a block of lines before outputting them. Some troff requests and macros are block-oriented, in that commands must surround a block of text. Usually a code at the beginning enables the format and one at the end disables the format. HTML-coded documents also contain many block-oriented constructs. For instance, "<p>" begins a paragraph and "</p>" ends it. In the next example, we'll look at placing HTML-style paragraph tags in a plain text file. For this example, the input is a file containing variable-length lines that form paragraphs; each paragraph is separated from the next one by a blank line. Therefore, the script must collect all lines in the hold space until a blank line is encountered. The contents of the hold space are retrieved and surrounded with the paragraph tags.

Here's the script:

/^$/!{
     H
     d
     }
/^$/{
	x
	s/^\n/<p>/
	s/$/<\/p>/
	G
	}

Running the script on a sample file produces:

<p>My wife won't let me buy a power saw.  She is afraid of an
accident if I use one.
So I rely on a hand saw for a variety of weekend projects like
building shelves.
However, if I made my living as a carpenter, I would
have to use a power
saw.  The speed and efficiency provided by power tools
would be essential to being productive.</p>

<p>For people who create and modify text files,
sed and awk are power tools for editing.</p>

<p>Most of the things that you can do with these programs
can be done interactively with a text editor.  However,
using these programs can save many hours of repetitive
work in achieving the same result.</p>

The script has basically two parts, corresponding to each address. Either we do one thing if the input line is not blank or a different thing if it is. If the input line is not blank, it is appended to the hold space (with H), and then deleted from the pattern space. The delete command prevents the line from being output and clears the pattern space. Control passes back to the top of the script and a new line is read. The general idea is that we don't output any line of text; it is collected in the hold space.

If the input line is blank, we process the contents of the hold space. To illustrate what the second procedure does, let's use the second paragraph in the previous sample file and show what happens. After a blank line has been read, the pattern space and the hold space have the following contents:

`Pattern Space:`	`^$`
`Hold Space:`	`\nFor people who create and modify text files, \nsed and awk are power tools for editing.`

A blank line in the pattern space is represented as "^$", the regular expression that matches it. The embedded newlines are represented in the hold space by "\n". Note that the Hold command puts a newline in the hold space and then appends the current line to the hold space. Even when the hold space is empty, the Hold command places a newline before the contents of the pattern space.

The exchange command (x) swaps the contents of the hold space and the pattern space. The blank line is saved in the hold space so we can retrieve it at the end of the procedure. (We could insert a newline in other ways, also.)

`Pattern Space:`	`\nFor people who create and modify text files, \nsed and awk are power tools for editing.`
`Hold Space:`	`^$`

Now we make two substitutions: placing "<p>" at the beginning of the pattern space and "</p>" at the end. The first substitute command matches "^\n" because a newline is at the beginning of the line as a consequence of the Hold command. The second substitute command matches the end of the pattern space ("$" does not match any embedded newlines but only the terminal newline.)

`Pattern Space:`	`<p>For people who create and modify text files, \nsed and awk are power tools for editing.</p>`
`Hold Space:`	`^$`

Note that the embedded newline is preserved in the pattern space. The last command, G, appends the blank line in the hold space to the pattern space. Upon reaching the bottom of the script, sed outputs the paragraph we had collected in the hold space and coded in the pattern space.

This script illustrates the mechanics of collecting input and holding on to it until another pattern is matched. It's important to pay attention to flow control in the script. The first procedure in the script does not reach bottom because we don't want any output yet. The second procedure does reach bottom, clearing the pattern space and the hold space before we begin collecting lines for the next paragraph.

This script also illustrates how to use addressing to set up exclusive addresses, in which a line must match one or the other address. You can also set up addresses to handle various exceptions in the input and thereby improve the reliability of a script. For instance, in the previous script, what happens if the last line in the input file is not blank? All the lines collected since the last blank line will not be output. There are several ways to handle this, but a rather clever one is to manufacture a blank line that the blank-line procedure will match later in the script. In other words, if the last line contains a line of text, we will copy the text to the hold space and clear the contents of the pattern space with the substitute command. We make the current line blank so that it matches the procedure that outputs what has been collected in the hold space. Here's the procedure:

${
/^$/!{
     H
     s/.*//
     }
}

This procedure must be placed in the script before the two procedures shown earlier. The addressing symbol "$" matches only the last line in the file. Inside this procedure, we test for lines that are not blank. If the line is blank, we don't have to do anything with it. If the current line is not blank, then we append it to the hold space. This is what we do in the other procedure that matches a non-blank line. Then we use the substitute command to create a blank line in the pattern space.

Upon exiting this procedure, there is a blank line in the pattern space. It matches the subsequent procedure for blank lines that adds the HTML paragraph codes and outputs the paragraph.

6.3. Hold That Line

6.3.1. A Capital Transformation

6.3.2. Correcting Index Entries (Part II)

6.3.3. Building Blocks of Text