The
wc
(word count) command counts the number of lines, words, and
characters in the files you specify.
(
Like most UNIX utilities (
1.30
)
,
wc
reads from its standard input
if you don't specify a filename.)
For example, the file
letter
has 120 lines, 734 words, and 4297
characters:
%
wc letter
120 734 4297 letter
You can restrict what is counted
by specifying the options
-l
(count lines only),
-w
(count words only), and
-c
(count characters only).
For example, you can count the number of lines in a file:
%
wc -l letter
120 letter
or you can count the number of files in a directory:
%
cd man_pages
%
ls | wc -w
233
The first example uses a file as input; the second example pipes
the output of an
ls
command to the input of
wc
.
(Be aware that the
-a
option (
16.11
)
makes
ls
list dot files.
If your
ls
command is
aliased (
10.2
)
to include
-a
or other options that add words to the normal output - such as the
line
total
nnn
from
ls -l
-then
you may not get the results you want.)
The fact that you can pipe the output of a command through
wc
lets you
use
wc
to perform addition and subtraction. For example, I once wrote
a shell script that involved, among other things, splitting files into several
pieces, and I needed the script to keep track of how many files were
created. (The script ran
csplit
(
35.10
)
on each file, producing an arbitrary
number of new files named
file.00
,
file.01
,
file.02
, etc.)
Here's the code I used to solve this problem:
`...`
expr
|
before=`ls $file* | wc -l` # count the file
split the file by running it through csplit
after=`ls $file* | wc -l` # count file plus new splits
num_files=`expr $after - $before` # evaluate the difference
|
As another trick, the following command will tell you how many more words
are in
new.file
than in
old.file
:
%
expr `wc -w < new.file` - `wc -w < old.file`
[The C and Korn shells have built-in arithmetic commands and don't really need
expr
-but
expr
works in all shells.
-JP
]
Notice that you should have
wc
read the input files
by using a
<
character. If instead you say:
%
expr `wc -w new.file` - `wc -w old.file`
the filenames will show up in the expressions and produce a syntax error.
[1]
count.it
|
Taking this concept further, here's a simple shell script to calculate
the differences in word count between two files:
|
echo
|
count_1=`wc -w < $1` # number of words in file 1
count_2=`wc -w < $2` # number of words in file 2
diff_12=`expr $count_1 - $count_2` # difference in word count
# if $diff_12 is negative, reverse order and don't show the minus sign:
case "$diff_12" in
-*) echo "$2 has `expr $diff_12 : '-\(.*\)'` more words than $1" ;;
*) echo "$1 has $diff_12 more words than $2" ;;
esac
|
If this script were called
count.it
, then you could invoke it like this:
%
count.it draft.2 draft.1
draft.1 has 23 more words than draft.2
You could modify this script to count lines or characters.
NOTE:
Unless the counts are very large, the output of
wc
will have leading
spaces. This can cause trouble in scripts if you aren't careful.
For instance, in the script above, the command:
echo "$1 has $count_1 words"
might print:
draft.2 has 79 words
See the extra spaces?
Understanding how the shell handles
quoting (
8.14
)
will help here.
If you can, let the shell read the
wc
output and remove extra spaces.
For example, without quotes, the shell passes four separate words to
echo
-and
echo
adds a single space between each word:
echo $1 has $count_1 words
that might print:
draft.2 has 79 words
That's especially important to understand when you use
wc
with commands like
test
or
expr
which don't expect spaces
in their arguments.
If you can't use the shell to strip out the spaces, delete them by
piping the
wc
output through
tr -d ' '
(
35.11
)
.
Finally, two notes about file size:
-
wc -c
isn't an efficient way to count the characters in large
numbers of files.
wc
opens and reads each file, which takes time.
The fourth or fifth column of output from
ls -l
(depending on
your version) gives the character count without opening the file.
You can sum
ls -l
counts for multiple files with the
addup
(
49.7
)
command.
For example:
%
ls -l
files
| addup 4
670518
-
Using character counts (as in the item above) doesn't give you the total
disk space used by files.
That's because, in general, each file takes at least one disk block
to store.
The
du
(
24.9
)
command gives accurate disk usage.