Writing sed Scripts (sed & awk, Second Edition)

4.1.1. The Pattern Space

Sed maintains a pattern space, a workspace or temporary buffer where a single line of input is held while the editing commands are applied.[22] The transformation of the pattern space by a two-line script is shown in Figure 4.1. It changes "The Unix System" to "The UNIX Operating System."

[22]One advantage of the one-line-at-a-time design is that sed can read very large files without any problems. Screen editors that have to read the entire file into memory, or some large portion of it, can run out of memory or be extremely slow to use in dealing with large files.

Initially, the pattern space contains a copy of a single input line. In Figure 4.1, that line is "The Unix System." The normal flow through the script is to execute each command on that line until the end of the script is reached. The first command in the script is applied to that line, changing "Unix" to "UNIX." Then the second command is applied, changing "UNIX System" to "UNIX Operating System."[23] Note that the pattern for the second substitute command does not match the original input line; it matches the current line as it has changed in the pattern space.

[23]Yes, we could have changed "Unix System" to "UNIX Operating System" in one step. However, the input file might have instances of "UNIX System" as well as "Unix System." So by changing "Unix" to "UNIX" we make both instances consistent before changing them to "UNIX Operating System."

When all the instructions have been applied, the current line is output and the next line of input is read into the pattern space. Then all the commands in the script are applied to that line.

Figure 4.1. The commands in the script change the contents of the pattern space.

As a consequence, any sed command might change the contents of the pattern space for the next command. The contents of the pattern space are dynamic and do not always match the original input line. That was the problem with the sample script at the beginning of this chapter. The first command would change "pig" to "cow" as expected. However, when the second command changed "cow" to "horse" on the same line, it also changed the "cow" that had been a "pig." So, where the input file contained pigs and cows, the output file has only horses!

This mistake is simply a problem of the order of the commands in the script. Reversing the order of the commands--changing "cow" into "horse" before changing "pig" into "cow"--does the trick.

s/cow/horse/g
s/pig/cow/g

Some sed commands change the flow through the script, as we will see in subsequent chapters. For example, the N command reads another line into the pattern space without removing the current line, so you can test for patterns across multiple lines. Other commands tell sed to exit before reaching the bottom of the script or to go to a labeled command. Sed also maintains a second temporary buffer called the hold space. You can copy the contents of the pattern space to the hold space and retrieve them later. The commands that make use of the hold space are discussed in Chapter 6.

Chapter 4. Writing sed Scripts

Contents:

4.1. Applying Commands in a Script

4.1.1. The Pattern Space

Figure 4.1. The commands in the script change the contents of the pattern space.