29. Spell Checking, Word Counting, and Textual Analysis

Contents:
The UNIX spell Command
Check Spelling Interactively with ispell
How Do I Spell That Word?
Inside spell
Adding Words to ispell's Dictionary
Counting Lines, Words, and Characters: wc
Count How Many Times Each Word Is Used
Find a a Doubled Word
Looking for Closure
Just the Words, Please

29.1 The UNIX spell Command

The spell command reads one or more files and prints a list of words that may be misspelled. You can redirect the output to a file, use grep ( 27.1 ) to locate each of the words, and then use vi or ex to make the edits. It's also possible to hack up a shell and sed script that interactively displays the misspellings and fixes them on command, but realistically, this is too tedious for most users. (The ispell ( 29.2 ) program solves many - though not all - of these problems.)

When you run spell on a file, the list of words it produces usually includes a number of legitimate words or terms that the program does not recognize. spell is case-sensitive; it's happy with Aaron but complains about aaron . You must cull out the proper nouns and other words spell doesn't know about to arrive at a list of true misspellings. For instance, look at the results on this sample sentence:

Alcuin uses TranScript to convert ditroff into
PostScript output for the LaserWriter printerr.
$ 

spell sample


Alcuin
ditroff
printerr
LaserWriter
PostScript
TranScript

Only one word in this list is actually misspelled.

On many UNIX systems, you can supply a local dictionary file so that spell recognizes special words and terms specific to your site or application. After you have run spell and looked through the word list, you can create a file containing the words that were not actual misspellings. The spell command will check this list after it has gone through its own dictionary. [On systems where I've used it, your word list file had to be sorted ( 36.1 ) . - JP ]

If you added the special terms in a file named dict , you could specify that file on the command line using the + option:

$ 

spell +dict sample


printerr

The output is reduced to the single misspelling.

The spell command will also miss words specified as arguments to nroff or troff macros ( 43.13 ) , and like any spelling checker, will make some errors based on incorrect derivation of spellings from the root words contained in its dictionary. If you understand how spell works ( 29.4 ) , you may be less surprised by some of these errors.

- DD from UNIX Text Processing , Hayden Books, 1987