sort -u sorts the file and eliminates duplicate lines. It's more powerful thanbecause:
In return, there are a few things that uniq does that sort won't do - like print only those lines that aren't repeated, or count the number of times each line is repeated. But on the whole, I find sort -u more useful.
Here's one idea for using sort -u . When I was writing a manual, I often needed to make tables of error messages. The easiest way to do this was to grep the source code for printf statements; write somemacros to eliminate junk that I didn't care about; use sort -u to put the messages in order and get rid of duplicates; and write some more Emacs macros to format the error messages into a table. All I had to do was write the descriptions.
One important option (that I've mentioned a number of times) is -b ; this tells sort to ignore extra white space at the beginning of each field. This is absolutely essential; otherwise, your sorts will have rather strange results. In my opinion, -b should be the default. But they didn't ask me.
Another thing to remember about -b : it only works if you explicitly specify which fields you want to sort. By itself, sort -b is the same as sort : white space characters are counted. I call this a bug, don't you?
If you don't care about the difference between uppercase and lowercase letters, invoke sort with the -f (case-fold) option. This folds lowercase letters into uppercase. In other words, it treats all letters as uppercase.
The -M option tells sort to treat the first three non-blank characters of a field as a three-letter month abbreviation, and to sort accordingly. That is, JAN comes before FEB, which comes before MAR. This option isn't available on all versions of UNIX.
The -r option tells sort to "reverse" the order of the sort; i.e., Z comes before A, 9 comes before 1, and so on. You'll find that this option is really useful. For example, imagine you have a program running in the background that records the number of free blocks in the filesystem at midnight each night. Your log file might look like this:
Jan 1 1992: 108 free blocks Jan 2 1992: 308 free blocks Jan 3 1992: 1232 free blocks Jan 4 1992: 76 free blocks ...
The script below finds the smallest and largest number of free blocks in your log file:
It's not profound, but it's an example of what you can do.