22.6.1. Dealing with Repeated Lines
sort
-u sorts
the file and eliminates duplicate lines. It's more
powerful than uniq (Section 21.20) because:
-
It sorts the file for you; uniq assumes that the
file is already sorted and won't do you any good if
it isn't.
-
It is much more flexible. sort -u considers
lines "unique" if the sort fields (Section 22.2)
you've selected match. So the lines
don't even have to be (strictly speaking) unique;
differences outside of the sort fields are ignored.
In return, there are a few things that uniq does
that sort won't do -- such as
print only those lines that aren't repeated, or
count the number of times each line is repeated. But on the whole, I
find sort -u more useful.
Here's one idea for using sort
-u. When I was writing a manual, I often needed to make
tables of error messages. The easiest way to do this was to
grep the source code for
printf statements, write some Emacs (Section 19.1) macros to
eliminate junk that I didn't care about, use
sort -u to put the messages in order and get rid
of duplicates, and write some more Emacs macros to format the error
messages into a table. All I had to do then was write the
descriptions.