In article 43.18 we discussed several techniques for removing overstriking and underlining from nroff output. Of course, that's not the only problem you'll face when you're working with nroff . Here are some more postprocessing tricks for nroff files.
You may also want to remove strange escape sequences that
produce formfeeds or various other printer functions.
you sometimes see the sequence
The ESC character is entered in vi by typingfollowed by the ESC key. In Emacs, use . The number 9 is literal.
The typical manual page also uses leading spaces to establish the left margin and to indent most of the text. On further inspection, you'll see that leading spaces precede headings (such as "NAME"), but a single tab precedes each line of text. Tabs may also appear unexpectedly in the text. Of course, using TABs wherever possible is a good idea on the whole; on a mechanical printer, and even on modern CRT displays, it's much quicker to print a TAB than to move the cursor over several spaces. However, the TABs can cause trouble if your printer (or terminal) isn't set correctly, or when you're trying to search for something in the text.
To eliminate the left margin and the unwanted TABs, use the following two sed commands:
s/^[ [TAB] ]*// s/[TAB] / /g
The first command looks for any number of TABs or spaces at the beginning of a line. The second command looks for a tab and replaces it with a single space.
Now, let's put all these pieces together - including the script to strip underlines and overstrikes (from article 43.18 ). Here's a script called sedman that incorporates all of these tricks.
#!/bin/sed -f #sedman - deformat nroff-formatted man page s/.^H//g s/^[9//g s/^[ [TAB] ]*// s/[TAB] / /g
Running this script on a typical manual page produces a file that looks like this:
who who NAME who - who is on the system? SYNOPSIS who [-a] [-b] [-d] [-H] [-l] [-p] [-q] [-r] [-s] [-t] [-T] [-u] [file] who am i DESCRIPTION who can list the user's name, terminal line, login time, elapsed time since activity occurred on the line, and the ...