What Is (or Isn't) Unique? (Unix Power Tools, 3rd Edition)

21.20. What Is (or Isn't) Unique?

Go to http://examples.oreilly.com/upt3 for more information on: uniq

uniq reads a file and compares adjacent lines (which means you'll usually want to sort the file first to be sure identical lines appear next to each other). Here's what uniq can do as it watches the input lines stream by:

With the -u option, the output gets only the lines that occur just once (and weren't repeated).
The -d option does the opposite: the output gets a single copy of each line that was repeated (no matter how many times it was repeated).

(The GNU version also has a -D option. It's like -d except that all duplicate lines are output.)
The default output (with no options) is the union of -u and -d: only the first occurrence of a line is written to the output file; any adjacent copies of a line (second, third, etc.) are ignored.
The output with -c is like the default, but each line is preceded by a count of how many times it occurred.

WARNING: Be warned:
% uniq file1 file2
will not print the unique lines from both file1 and file2 to standard output. It will replace the contents of file2 with the unique lines from file1!

Three more options control how comparisons are done:

-n ignores the first n fields of a line and all whitespace before each. A field is defined as a string of nonwhitespace characters (separated from its neighbors by whitespace).
+n ignores the first n characters. Fields are skipped before characters.
-w n in the GNU version compares no more than n characters in each line.
GNU uniq also has -i to make comparisons case-insensitive. (Upper- and lowercase letters compare equal.)

uniq is often used as a filter. See also comm (Section 11.8), sort (Section 22.1), and especially sort -u (Section 22.6).

So what can you do with all of this?

To send only one copy of each line from list (which is typically sorted) to output file list.new:

uniq list list.new

To show which names appear more than once:

sort names | uniq -d

To show which lines appear exactly three times, search the output of uniq -c for lines that start with spaces before the digit 3 and have a tab after. (This is the way GNU uniq -c makes its output lines, at least.) In the example below, the space is marked by Ë˜; the TAB is marked by tab:

grep Section 13.1

sort names | uniq -c | grep "^Ë˜*3tab"

The lines don't have to be sorted; they simply have to be adjacent. For example, if you have a log file where the last few fields are repeated, you can have uniq "watch" those fields and tell you how many times they were repeated. Here we'll skip the first four fields and get a count of how many times the rest of each line was repeated:

$ cat log
Nov 21 17:20:19 powerd: down 2 volts
Nov 21 17:20:27 powerd: down 2 volts
Nov 21 17:21:15 powerd: down 2 volts
Nov 21 17:22:48 powerd: down 2 volts
Nov 21 18:18:02 powerd: up 3 volts
Nov 21 19:55:03 powerd: down 2 volts
Nov 21 19:58:41 powerd: down 2 volts
$ uniq -4 -c log
      4 Nov 21 17:20:19 powerd: down 2 volts
      1 Nov 21 18:18:02 powerd: up 3 volts
      2 Nov 21 19:55:03 powerd: down 2 volts

--JP and DG