One of the authors once did a writing project for a computer company,
here referred to as BigOne Computer. The document had to include a
product bulletin for "Horsefeathers Software." The company promised
that the product bulletin was online and that they would send it.
Unfortunately, when the file arrived, it contained the formatted
output for a line printer, the only way they could provide it. A
portion of that file (saved for testing in a file named
horsefeathers) follows.
HORSEFEATHERS SOFTWARE PRODUCT BULLETIN
DESCRIPTION
+ ___________
BigOne Computer offers three software packages from the suite
of Horsefeathers software products -- Horsefeathers Business
BASIC, BASIC Librarian, and LIDO. These software products can
fill your requirements for powerful, sophisticated,
general-purpose business software providing you with a base for
software customization or development.
Horsefeathers BASIC is BASIC optimized for use on the BigOne
machine with UNIX or MS-DOS operating systems. BASIC Librarian
is a full screen program editor, which also provides the ability
Note that the text has been justified with spaces added between words.
There are also spaces added to create a left margin.
We find that when we begin to tackle a problem using sed, we do best if we
make a mental list of all the things we want to do. When we begin
coding, we write a script containing a single command that does one
thing. We test that it works, then we add another command, repeating
this cycle until we've done all that's obvious to do. ("All that's
obvious" because the list is not always complete, and the cycle of
implement-and-test often adds other items to the list.)
It may seem to be a rather tedious process to work this way and indeed
there are a number of scripts where it's fine to take a crack at
writing the whole script in one pass and then begin testing it.
However, the one-step-at-a-time technique is highly recommended for beginners
because you isolate each command and get to easily see what is working
and what is not. When you try to do several commands at once, you
might find that when problems arise you end up recreating the
recommended process in reverse; that is, removing commands one by one
until you locate the problem.
Here is a list of the obvious edits that need to be made to the
Horsefeathers Software bulletin:
Replace all blank lines with a paragraph macro (.LP).
Remove all leading spaces from each line.
Remove the printer underscore line, the one that begins with a "+".
Remove multiple blank spaces that were added between words.
The first edit requires that we match blank lines. However, in
looking at the input file, it wasn't obvious whether the blank lines
had leading spaces or not. As it turns out, they do not, so blank
lines can be matched using the pattern "^$". (If there were spaces on
the line, the pattern could be written "^
*$".) Thus, the first
edit is fairly straightforward to accomplish:
s/^$/.LP/
It replaces each blank line with ".LP". Note that you do not escape
the literal period in the replacement section of the substitute
command. We can put this command in a file named
sedscr and test the command as follows:
$ sed -f sedscr horsefeathers
HORSEFEATHERS SOFTWARE PRODUCT BULLETIN
.LP
DESCRIPTION
+ ___________
.LP
BigOne Computer offers three software packages from the suite
of Horsefeathers software products -- Horsefeathers Business
BASIC, BASIC Librarian, and LIDO. These software products can
fill your requirements for powerful, sophisticated,
general-purpose business software providing you with a base for
software customization or development.
.LP
Horsefeathers BASIC is BASIC optimized for use on the BigOne
machine with UNIX or MS-DOS operating systems. BASIC Librarian
is a full screen program editor, which also provides the ability
It is pretty obvious which lines have changed. (It is frequently
helpful to cut out a portion of a file to use for testing. It works
best if the portion is small enough to fit on the screen yet is large
enough to include different examples of what you want to change.
After all edits have been applied successfully to the test file, a
second level of testing occurs when you apply them to the complete,
original file.)
The next edit that we make is to remove the line that begins with a
"+" and contains a line-printer underscore. We can simply delete this
line using the delete command, d. In writing a
pattern to match this line, we have a number of choices. Each of the
following would match that line:
/^+/
/^+
/
/^+
*/
/^+
*__*/
As you can see, each successive regular expression matches a greater
number of characters. Only through testing can you determine how
complex the expression needs to be to match a specific line and not
others. The longer the pattern that you define in a regular
expression, the more comfort you have in knowing that it won't produce
unwanted matches. For this script, we'll choose the third expression:
/^+
*/d
This command will delete any line that begins with a plus sign and is
followed by at least one space. The pattern specifies two spaces, but
the second is modified by "*", which means that the second space might
or might not be there.
This command was added to the sed script and tested but since it only
affects one line, we'll omit showing the results and move on. The
next edit needs to remove the spaces that pad the beginning of a line.
The pattern for matching that sequence is very similar to the address
for the previous command.
s/^
*//
This command removes any sequence of spaces found at the beginning of
a line. The replacement portion of the substitute command is empty,
meaning that the matched string is removed.
We can add this command to the script and test it.
$ sed -f sedscr horsefeathers
HORSEFEATHERS SOFTWARE PRODUCT BULLETIN
.LP
DESCRIPTION
.LP
BigOne Computer offers three software packages from the suite
of Horsefeathers software products -- Horsefeathers Business
BASIC, BASIC Librarian, and LIDO. These software products can
fill your requirements for powerful, sophisticated,
general-purpose business software providing you with a base for
software customization or development.
.LP
Horsefeathers BASIC is BASIC optimized for use on the BigOne
machine with UNIX or MS-DOS operating systems. BASIC Librarian
is a full screen program editor, which also provides the ability
The next edit attempts to deal with the extra spaces added to justify
each line. We can write a substitute command to match any string of
consecutive spaces and replace it with a single space.
s/
*/
/g
$ sed -f sedscr horsefeathers
HORSEFEATHERS SOFTWARE PRODUCT BULLETIN
.LP
DESCRIPTION
.LP
BigOne Computer offers three software packages from the suite
of Horsefeathers software products -- Horsefeathers Business
BASIC, BASIC Librarian, and LIDO. These software products can
fill your requirements for powerful, sophisticated,
general-purpose business software providing you with a base for
software customization or development.
.LP
Horsefeathers BASIC is BASIC optimized for use on the BigOne
machine with UNIX or MS-DOS operating systems. BASIC Librarian
is a full screen program editor, which also provides the ability
It works. Here's the completed script:
s/^$/.LP/
/^+
*/d
s/^
*//
s/
*/
/g
s/\.
*/.
/g
As we said earlier, the next stage would be to test the script on the
complete file (hf.product.bulletin),
using testsed, and examine the results thoroughly.
When we are satisfied with the results, we can use
runsed to make the changes permanent:
$ runsed hf.product.bulletin
done
By executing runsed, we have overwritten the
original file.
Before leaving this script, it is instructive to point out that
although the script was written to process a specific file, each of
the commands in the script is one that you might expect to use again,
even if you don't use the entire script again. In other words, you
may well write other scripts that delete blank lines or check for two
spaces following a period. Recognizing how commands can be reused in
other situations reduces the time it takes to develop and test new
scripts. It's like a singer learning a song and adding it to his or
her repetoire.