script counts the number of occurrences of each word in its input.
If you give it files, it reads from them; otherwise it reads standard input.
option folds uppercase into lowercase (uppercase letters
will count the same as lowercase).
Here's this book's Preface run through wordfreq
The script was taken from a long-ago
posting by Carl Brandauer.
Here is Carl's original script (with a few small edits):
cat $* | # tr reads the standard input
tr "[A-Z]" "[a-z]" | # Convert all uppercase to lowercase
tr -cs "a-z'" "\012" | # replace all characters not a-z or '
# with a new line. i.e. one word per line
sort | # uniq expects sorted input
uniq -c | # Count number of times each word appears
sort +0nr +1d | # Sort first from most to least frequent,
# then alphabetically
pr -w80 -4 -h "Concordance for $*" # Print in four columns
The version on the disc is somewhat different.
It adjusts the tr
commands for the script's -i
The disc version also doesn't use pr
to make output in four
columns, though you can add that to your copy of the script - or just
pipe the wordfreq
output through pr
on the command line
when you need it.
The second tr
command above (with the
is for the Berkeley version of tr
For System V tr
, the command should be:
tr -cs "[a-z]'" "[\012*]"
If you aren't sure which version of tr
You could use
One of the beauties of a simple script like this is that you can
tweak it if you don't like the way it counts.
For example, if you want hyphenated words like copy-editor
to count as one, add a hyphen to the
(System V) or