5.3.1.1. Correcting index entries
Later, in the awk section of this book, we will present a program for
formatting an index, such as the one for this book. The first step in
creating an index is to place index codes in the document files. We
use an index macro named .XX, which takes a single argument, the index
entry. A sample index entry might be:
.XX "sed, substitution command"
Each index entry appears on a line by itself. When you run an index,
you get a collection of index entries with page numbers that are then
sorted and merged in a list. An editor poring over that list will
typically find errors and inconsistencies that need to be corrected.
It is, in short, a pain to have to track down the file where an index
entry resides and then make the correction, particularly when there
are dozens of entries to be corrected.
Sed can be a great help in making these edits across a group of files.
One can simply create a list of edits in a sed script and then run it
on all the files. A key point is that the substitute command needs an
address that limits it to lines beginning ".XX". Your script should
not make changes in the text itself.
Let's say that we wanted to change the index entry above to "sed,
substitute command." The following command would do it:
/^\.XX /s/sed, substitution command/sed, substitute command/
The address matches all lines that begin with ".XX " and only on those
lines does it attempt to make the replacement. You might wonder, why
not specify a shorter regular expression? For example:
/^\.XX /s/substitution/substitute/
The answer is simply that there could be other entries which use the
word "substitution" correctly and which we would not want to change.
We can go a step further and provide a shell script that creates a
list of index entries prepared for editing as a series of sed
substitute commands.
#! /bin/sh
# index.edit -- compile list of index entries for editing.
grep "^\.XX" $* | sort -u |
sed '
s/^\.XX \(.*\)$/\/^\\.XX \/s\/\1\/\1\//'
The index.edit shell script
uses grep to extract
all lines containing index entries from any number of files specified
on the command line. It passes this list through
sort which, with the -u option,
sorts and removes duplicate entries. The list is then piped to sed,
and the one-line sed script builds a substitution command.
Let's look at it more closely. Here's just the regular expression:
^\.XX \(.*\)$
It matches the entire line, saving the index entry for recall. Here's
just the replacement string:
\/^\\.XX \/s\/\1\/\1\/
It generates a substitute command beginning with an address: a slash,
followed by two backslashes--to output one backslash to protect
the dot in the ".XX" that follows--then comes a space, then
another slash to complete the address. Next we output an "s" followed
by a slash, and then recall the saved portion to be used as a regular
expression. That is followed by another slash and again we
recall the saved substring as the replacement string. A slash finally
ends the command.
When the index.edit script is run on a file, it
creates a listing similar to this:
$ index.edit ch05
/^\.XX /s/"append command(a)"/"append command(a)"/
/^\.XX /s/"change command"/"change command"/
/^\.XX /s/"change command(c)"/"change command(c)"/
/^\.XX /s/"commands:sed, summary of"/"commands:sed, summary of"/
/^\.XX /s/"delete command(d)"/"delete command(d)"/
/^\.XX /s/"insert command(i)"/"insert command(i)"/
/^\.XX /s/"line numbers:printing"/"line numbers:printing"/
/^\.XX /s/"list command(l)"/"list command(l)"/
This output could be captured in a file. Then you can delete the
entries that don't need to change and you can make changes by editing
the replacement string. At that point, you can use this file as a sed
script to correct the index entries in all document files.
When doing a large book with lots of entries, you
might use grep again to extract particular entries
from the output of index.edit and direct them into
their own file for editing. This saves you from having to wade through
numerous entries.
There is one small failing in this program. It should look for
metacharacters that might appear literally in index entries and
protect them in regular expressions. For instance, if an index entry
contains an asterisk, it will not be interpreted as such, but as a
metacharacter. To make that change effectively requires the use of
several advanced commands, so we'll put off improving this script until
the next chapter.