43.21 Preprocessing troff Input with sed
On a typewriter-like device (including a CRT), an em-dash
is typed as a pair of hyphens (
Similarly, a typesetter provides "curly" quotation marks ("
and ")
as opposed to a typewriter's straight quotes (
A peculiarity of
troff
is that it generates the space before each word in the
font used at the beginning of that word. This means that when we
mix a constant-width font such as Courier within text, we get a
noticeably large space before each word, which can be distracting
for readers - for example:
The following The solution for each of these problems is to preprocess troff input with sed ( 34.24 ) . This is an application that shows sed in its role as a true stream editor, making edits in a pipeline - edits that are never written back into a file. We almost never invoke troff directly. Instead, we invoke it with a script that strings together a pipeline including the standard preprocessors (when appropriate) as well as doing this special preprocessing with sed . The sed commands themselves are fairly simple. The following command changes two consecutive dashes into an em-dash:
s/-/\\(em/g
We double the backslashes in the replacement string
for However, there may be cases in which we don't want this substitution command to be applied. What if someone is using hyphens to draw a horizontal line? We can refine the script to exclude lines containing three or more consecutive hyphens. To do this, we use the ! address modifier ( 34.19 ) :
/--/!s/-/\\(em/g It may take a moment to penetrate this syntax. What's different is that we use a pattern address to restrict the lines that are affected by the substitute command, and we use ! to reverse the sense of the pattern match. It says, simply, "If you find a line containing three consecutive hyphens, don't apply the edit." On all other lines, the substitute command will be applied.
Similarly, to deal with the font change problem, we can use
sed
to search for all strings matching
s/\\f(C[WIB]/\\\&&/g
To deal with the open and closed quote
problem,
the script needs to be more involved because there are
many separate cases that must be accounted for.
You need to make
sed
smart enough to change double
quotes to open quotes only at the beginning of words and to
change them to closed quotes only at the end of words.
Such a script might look like the one below, which obviously
could be shortened by judicious application of
s/^"/``/ s/"$/''/ s/"? /''? /g s/"?$/''?/ s/ "/ ``/g s/" /'' /g s/ [TAB] "/ [TAB] ``/g s/" [TAB] /'' [TAB] /g s/")/'')/g s/"]/'']/g s/("/(``/g s/\["/\[``/g s/";/'';/g s/":/'':/g s/,"/,''/g s/",/'',/g s/\."/.\\\&''/g s/"\./''.\\\&/g s/"\\(em/''\\(em/g s/\\(em"/\\(em``/g
In addition to the changes described above, it tightens up the spacing of ellipses (...), and doesn't do anything between certain pairs of troff macros ( 34.19 ) . - |
|