The UNIX formatter nroff produces output for line printers and CRT displays. To achieve such special effects as emboldening, it outputs the character followed by a backspace and then outputs the same character again. A sample of it viewed with a text editor or might look like:
which emboldens the word "NAME." There are three overstrikes for each character output. Similarly, underlining is achieved by outputting an underscore, a backspace, and then the character to be underlined. Some pagers, such as 50.3 ). There are a number of ways to get rid of these decorations. The easiest way to do it is to use a utility like col , colcrt , or ul :, take advantage of overstruck text. But there are many times when it's necessary to strip these special effects; for example, if you want to grep through formatted man pages (as we do in article
Both col and colcrt attempt to handle "half linefeeds" (used to print superscripts and subscripts) reasonably. Many printers handle half linefeeds correctly, but most terminals can't deal with them.
Here's one other solution to the problem: a simplescript. The virtue of this solution is that you can elaborate on it, adding other features that you'd like, or integrating it into larger sed scripts. The following sed command removes the sequences for emboldening and underscoring:
It removes any character preceding the backspace along with the
In the case of underlining, "." matches the underscore; for emboldening,
it matches the overstrike character.
Because it is applied repeatedly, multiple occurrences of the overstrike
character are removed, leaving a single character for each sequence.