16.6. Counting Lines, Words, and Characters: wc
The wc (word count) command counts the
number of lines, words, and characters in the files you specify.
(Like most Unix utilities, wc reads from its
standard input if you don't specify a filename.) For
example, the file letter has 120 lines, 734
words, and 4,297 characters:
% wc letter
120 734 4297 letter
You can restrict what is counted by specifying the options
-l (count lines only),
-w (count words only), and
-c (count characters only). For example, you can
count the number of lines in a file:
% wc -l letter
120 letter
or you can count the number of
files in a directory:
% cd man_pages
% ls | wc -w
233
The first example uses a file as input; the second example pipes the
output of an ls command to the input of
wc. (Be aware that the -a
option (Section 8.9) makes
ls list dot files. If your
ls command is aliased (Section 29.2) to
include -a or other options that add words to the
normal output -- such as the line total
nnn from ls
-l -- then you may not get the results you want.)
The following command will tell you how
many more words are in new.file than in
old.file:
% expr `wc -w < new.file` - `wc -w < old.file`
Many shells have built-in arithmetic commands and
don't really need expr ; however,
expr works in all shells.
NOTE:
In a programming application, you'll usually want
wc to read the input files by using a
< character, as shown earlier. If instead you
say:
% expr `wc -w new.file` - `wc -w old.file`
the filenames will show up in the expressions and produce a syntax
error.[48]
Taking this concept a step further, here's a simple
shell script to calculate the differences in word count between two
files:
count_1=`wc -w < $1` # number of words in file 1
count_2=`wc -w < $2` # number of words in file 2
diff_12=`expr $count_1 - $count_2` # difference in word count
# if $diff_12 is negative, reverse order and don't show the minus sign:
case "$diff_12" in
-*) echo "$2 has `expr $diff_12 : '-\(.*\)'` more words than $1" ;;
*) echo "$1 has $diff_12 more words than $2" ;;
esac
If this script were called count.it, then you
could invoke it like this:
% count.it draft.2 draft.1
draft.1 has 23 more words than draft.2
You could modify this script to count lines or characters.
NOTE:
Unless the counts are very large, the
output of wc will have leading spaces. This can
cause trouble in scripts if you aren't careful. For
instance, in the previous script, the command:
echo "$1 has $count_1 words"
might print:
draft.2 has 79 words
See the extra spaces? Understanding how the shell handles quoting (Section 27.12) will
help here. If you can, let the shell read the wc
output and remove extra spaces. For example, without quotes, the
shell passes four separate words to echo -- and
echo adds a single space between each word:
echo $1 has $count_1 words
that might print:
draft.2 has 79 words
That's especially important to understand when you
use wc with test or
expr commands that don't expect
spaces in their arguments. If you can't use the
shell to strip out the spaces, delete them by piping the
wc output through tr -d '
' (Section 21.11).
Finally, two notes about file size:
--JP, DG, and SP
 |  |  | 16.5. Adding Words to ispell's Dictionary |  | 16.7. Find a a Doubled Word |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|
|