N^HN^HN^HNA^HA^HA^HAM^HM^HM^HME^HE^HE^HE
which bolds the word "NAME." There are three overstrikes for each
character output. Similarly, underlining is achieved by outputting an
underscore, a backspace and then the character to be underlined. The
following example is the word "file" surrounded by a sequence for
underscoring it.
_^Hf_^Hi_^Hl_^He
It might be necessary at times to strip these printing
"special-effects"; perhaps if you are given this type of output as a
source file. The following line removes the sequences for emboldening
and underscoring:
s/.^H//g
It removes any character preceding the backspace along with the
backspace itself. In the case of underlining, "." matches the
underscore; for emboldening, it matches the overstrike character.
Because it is applied repeatedly, multiple occurrences of the
overstrike character are removed, leaving a single character for each
sequence. Note that ^H is entered in
vi by pressing CTRL-V followed by
CTRL-H.
^[9 who(1) who(1)
^[9 N^HN^HN^HNA^HA^HA^HAM^HM^HM^HME^HE^HE^HE
who - who is on the system?
S^HS^HS^HSY^HY^HY^HYN^HN^HN^HNO^HO^HO^HOP^HP^HP^HPS^HS^HS^HSI^HI
who [-a] [-b] [-d] [-H] [-l] [-p] [-q] [-r] [-s] [-t] [-T]
[-u] [_^Hf_^Hi_^Hl_^He]
who am i
who am I
D^HD^HD^HDE^HE^HE^HES^HS^HS^HSC^HC^HC^HCR^HR^HR^HRI^HI^HI^HIP^HP
who can list the user's name, terminal line, login time,
elapsed time since activity occurred on the line, and the
...
In addition to stripping out the bolding and underlining
sequences, there are strange escape sequences that produce form feeds
or various other printer functions. You can see the sequence
"^[9" at the top of the formatted manpage. This escape
sequence can simply be removed:
s/^[9//g
Once again, the ESC character is entered in
vi by typing CTRL-V followed by
pressing the ESC key. The number 9 is literal.
There are also what look to be leading spaces that supply the left
margin and indentation. On further examination, it turns out that
leading spaces precede the heading such as "NAME" but a single tab
precedes each line of text. Also, there are tabs that unexpectedly
appear in the text, which have to do with how nroff
optimizes for display on a CRT screen.
To eliminate the left margin and the unwanted tabs, we add two
commands to our previous two:
# sedman -- deformat nroff-formatted manpage
s/.^H//g
s/^[9//g
s/^[
•]*//g
s/•/ /g
The third command looks for any number of tabs or spaces at the
beginning of a line. (A tab is represented by "•" and a space by
"
".) The last command looks for a tab and replaces it
with a single space. Running this script on our sample
man page output produces a file that looks like
this:
who(1) who(1)
NAME
who - who is on the system?
SYNOPSIS
who [-a] [-b] [-d] [-H] [-l] [-p] [-q] [-r] [-s] [-t] [-T]
[-u] [file]
who am i
who am I
DESCRIPTION
who can list the user's name, terminal line, login time,
elapsed time since activity occurred on the line, and the
...
This script does not eliminate the unnecessary blank lines
caused by paging. We will look at how to do that in the next
chapter, as it requires a multiline operation.