5.11.1. Checking Out Reference Pages
Like many programs, a sed script often starts out small, and is simple
to write and simple to read. In testing the script, you may discover
specific cases for which the general rules do not apply. To account
for these, you add lines to your script, making it longer, more
complex, and more complete. While the amount of time you spend
refining your script may cancel out the time saved by not doing the editing
manually, at least during that time your mind has been engaged by your
own seeming sleight-of-hand: "See! The computer did it."
We encountered one such problem in preparing a formatted copy of
command pages that the writer had typed as a text file without any
formatting information. Although the files had no formatting codes,
headings were used consistently to identify the format of the command
pages. A sample file is shown below.
******************************************************************
NAME: DBclose - closes a database
SYNTAX:
void DBclose(fdesc)
DBFILE *fdesc;
USAGE:
fdesc - pointer to database file descriptor
DESC:
DBclose() closes a file when given its database file descriptor.
Your pending writes to that file will be completed before the
file is closed. All of your update locks are removed.
*fdesc becomes invalid.
Other users are not affected when you call DBclose(). Their update
locks and pending writes are not changed.
Note that there is no default file as there is in BASIC.
*fdesc must specify an open file.
DBclose() is analogous to the CLOSE statement in BASIC.
RETURNS:
There is no return value
******************************************************************
The task was to format this document for the laser printer, using the
reference header macros we had developed. Because there were perhaps
forty of these command pages, it would have been utter drudgery to go
through and add codes by hand. However, because there were that many,
and even though the writer was generally consistent in entering them,
there would be enough differences from command to command to have
required several passes.
We'll examine the process of building this sed script. In a sense,
this is a process of looking carefully at each line of a sample input
file and determining whether or not an edit must be made on that line.
Then we look at the rest of the file for similar occurrences. We try
to find specific patterns that mark the lines or range of lines that
need editing.
For instance, by looking at the first line, we know we need to
eliminate the row of asterisks separating each command. We specify an
address for any line beginning and ending with an asterisk and look
for zero or more asterisks in between. The regular expression uses an
asterisk as a literal and as a metacharacter:
/^\*\**\*$/d
This command deletes entire lines of asterisks anywhere they occur in
the file. We saw that blank lines were used to separate paragraphs,
but replacing every blank line with a paragraph macro would cause
other problems. In many cases, the blank lines can be removed because
spacing has been provided in the macro. This is a case where we put
off deleting or replacing blank lines on a global basis until we have
dealt with specific cases. For instance, some blank lines separate
labeled sections, and we can use them to define the end of a range of
lines. The script, then, is designed to delete unwanted blank lines
as the last operation.
Tabs were a similar problem. Tabs were used to indent syntax lines
and in some cases after the colon following a label, such as "NAME".
Our first thought was to remove all tabs by replacing them with eight
spaces, but there were tabs we wanted to keep, such as those
inside the syntax line. So we removed only specific cases, tabs at
the beginning of lines and tabs following a colon.
/^•/s///
/:•/s//:/
The next line we come across has the name of the command and a
description.
NAME: DBclose - closes a database
We need to replace it with the macro .Rh 0. Its syntax is:
.Rh 0 "command" "description"
We insert the macro at the beginning of the line, remove the hyphen,
and surround the arguments with quotation marks.
/NAME:/ {
s//.Rh 0 "/
s/ - /" "/
s/$/"/
}
We can jump ahead of ourselves a bit here and look at what this
portion of our script does to the sample line:
.Rh 0 "DBclose" "closes a database"
The next part that we examine begins with "SYNTAX." What we need to
do here is put in the .Rh macro, plus some additional
troff requests for indentation, a font change, and
no-fill and no-adjust. (The indentation is required because we
stripped the tabs at the beginning of the line.) These requests must
go in before and after the syntax lines, turning the capabilities on
and off. To do this, we define an address that specifies the range of
lines between two patterns, the label and a blank line. Then, using
the change command, we replace the label and the blank line with a
series of formatting requests.
/SYNTAX:/,/^$/ {
/SYNTAX:/c\
.Rh Syntax\
.in +5n\
.ft B\
.nf\
.na
/^$/c\
.in -5n\
.ft R\
.fi\
.ad b
}
Following the change command, each line of input ends with a backslash
except the last line. As a side effect of the change command, the
current line is deleted from the pattern space.
The USAGE portion is next, consisting of one or more descriptions of
variable items. Here we want to format each item as an indented
paragraph with a hanging italicized label. First, we output the .Rh
macro; then we search for lines having two parts separated by a tab
and a hyphen. Each part is saved, using backslash-parentheses, and
recalled during the substitution.
/USAGE:/,/^$/ {
/USAGE:/c\
.Rh Usage
/\(.*\)•- \(.*\)/s//.IP "\\fI\1\\fR" 15n\
\2./
}
This is a good example of the power of regular expressions.
Let's look ahead, once again, and preview the output for the sample.
.Rh Usage
.IP "\fIfdesc\fR" 15n
pointer to database file descriptor.
The next part we come across is the description. We notice that blank
lines are used in this portion to separate paragraphs. In specifying
the address for this portion, we use the next label, "RETURNS."
/DESC:/,/RETURNS/ {
/DESC:/i\
.LP
s/DESC: *$/.Rh Description/
s/^$/.LP/
}
The first thing we do is insert a paragraph macro because the
preceding USAGE section consisted of indented paragraphs. (We could
have used the variable-list macros from the -mm
package in the USAGE section; if so, we would insert the .LE at this
point.) This is done only once, which is why it is keyed to the
"DESC" label. Then we substitute the label "DESC" with the .Rh macro
and replace all blank lines in this section with a paragraph macro.
When we tested this portion of the sed script on our sample file, it
didn't work because there was a single space following the DESC label.
We changed the regular expression to look for zero or more spaces
following the label. Although this worked for the sample file, there
were other problems when we used a larger sample. The writer was
inconsistent in his use of the "DESC" label. Mostly, it occurred on a
line by itself; sometimes, though, it was included at the start of the
second paragraph. So we had to add another pattern to deal with this
case. It searches for the label followed by a space and one or more
characters.
s/DESC: *$/.Rh Description/
s/DESC: \(.*\)/.Rh Description\
\\1/
In the second case, the reference header macro is output followed by a
newline.
The next section, labeled "RETURNS," is handled in the same way
as the SYNTAX section.
We do make minor content changes, replacing the label "RETURNS" with
"Return Value" and consequently adding this substitution:
s/There is no return value\.*/None./
The very last thing we do is delete remaining blank lines.
/^$/d
Our script is put in a file named refsed.
Here it is in full:
# refsed -- add formatting codes to reference pages
/^\*\**\*$/d
/^•/s///
/:•/s//:/
/NAME:/ {
s//.Rh 0 "/
s/ - /" "/
s/$/"/
}
/SYNTAX:/,/^$/ {
/SYNTAX:/c\
.Rh Syntax\
.in +5n\
.ft B\
.nf\
.na
/^$/c\
.in -5n\
.ft R\
.fi\
.ad b
}
/USAGE:/,/^$/ {
/USAGE:/c\
.Rh Usage
/\(.*\)•- \(.*\)/s//.IP "\\fI\1\\fR" 15n\
\2./
}
/DESC:/,/RETURNS/ {
/DESC:/i\
.LP
s/DESC: *$/.Rh Description/
s/DESC: \(.*\)/.Rh Description\
\1/
s/^$/.LP/
}
/RETURNS:/,/^$/ {
/RETURNS:/c\
.Rh "Return Value"
s/There is no return value\.*/None./
}
/^$/d
As we have remarked, you should not have sed overwrite the original.
It is best to redirect the output of sed to another file or let it go
to the screen. If the sed script does not work properly, you will
find that it is generally easier to change the script and re-run it on
the original file than to write a new script to correct the problems
caused by a previous run.
$ sed -f refsed refpage
.Rh 0 "DBclose" "closes a database"
.Rh Syntax
.in +5n
.ft B
.nf
.na
void DBclose(fdesc)
DBFILE *fdesc;
.in -5n
.ft R
.fi
.ad b
.Rh Usage
.IP "\fIfdesc\fR" 15n
pointer to database file descriptor.
.LP
.Rh Description
DBclose() closes a file when given its database file descriptor.
Your pending writes to that file will be completed before the
file is closed. All of your update locks are removed.
*fdesc becomes invalid.
.LP
Other users are not effected when you call DBclose(). Their update
locks and pending writes are not changed.
.LP
Note that there is no default file as there is in BASIC.
*fdesc must specify an open file.
.LP
DBclose() is analogous to the CLOSE statement in BASIC.
.LP
.Rh "Return Value"
None.