6.3.2. Correcting Index Entries (Part II)
In the previous chapter, we looked at a shell script named
index.edit. This script extracts index entries
from one or more files and automatically generates a sed script
consisting of a substitute command for each index entry. We mentioned
that a small failing of the script was that it did not look out for regular
expression metacharacters that appeared as literals in an index entry,
such as the following:
.XX "asterisk (*) metacharacter"
After processing this entry, the original index.edit
generated the following substitute command:
/^\.XX /s/asterisk (*) metacharacter/asterisk (*) metacharacter/
While it "knows" to escape the period before ".XX", it doesn't protect
the metacharacter "*". The problem is that the pattern "(*)" does not
match "(*)" and the substitute command would fail to be applied. The
solution is to modify index.edit so
it looks for metacharacters and escapes them. There's one more
twist: a different set of metacharacters is recognized in the
replacement string.
We have to maintain two copies of the index entry. The first copy we
edit to escape regular expression metacharacters and then use for the
pattern. The second copy we edit to escape the metacharacters special
to the replacement string. The hold space keeps the second copy while
we edit the first copy, then we swap the two and edit the second copy.
Here's the script:
#! /bin/sh
# index.edit -- compile list of index entries for editing
# new version that matches metacharacters
grep "^\.XX" $* | sort -u |
sed '
h
s/[][\\*.]/\\&/g
x
s/[\\&]/\\&/g
s/^\.XX //
s/$/\//
x
s/^\\\.XX \(.*\)$/\/^\\.XX \/s\/\1\//
G
s/\n//'