21.20. What Is (or Isn't) Unique?
Go to http://examples.oreilly.com/upt3 for more information on: uniq
uniq reads a file and
compares adjacent lines (which means you'll usually
want to sort the file first to be sure identical lines appear next to
each other). Here's what uniq can
do as it watches the input lines stream by:
-
With the -u option, the output gets only the lines
that occur just once (and weren't repeated).
-
The -d option does the opposite: the output gets a
single copy of each line that was repeated (no
matter how many times it was repeated).
(The GNU version also has a -D option.
It's like -d except that
all duplicate lines are output.)
-
The default output (with no options) is the union of
-u and -d: only the first
occurrence of a line is written to the output file; any adjacent
copies of a line (second, third, etc.) are ignored.
-
The output with -c is like the default, but each
line is preceded by a count of how many times it occurred.
WARNING:
Be warned:
% uniq file1 file2
will not print the unique lines from both
file1 and file2 to standard
output. It will replace the contents of
file2 with the unique lines from
file1!
Three more options control how comparisons are done:
-
-n ignores the first
n fields of a line and all whitespace
before each. A field is defined as a string of nonwhitespace
characters (separated from its neighbors by whitespace).
-
+n ignores the first
n characters. Fields are skipped before
characters.
-
-w n in the GNU version
compares no more than n characters in each
line.
-
GNU uniq also has -i to make
comparisons case-insensitive. (Upper- and lowercase letters compare
equal.)
uniq is often used as a filter. See also comm (Section 11.8), sort (Section 22.1), and
especially sort -u (Section 22.6).
So what can you do with all of this?
To send only one copy of each line from list
(which is typically sorted) to output file
list.new:
uniq list list.new
To show which names appear more than once:
sort names | uniq -d
To show which lines appear exactly three times, search the output of
uniq -c for lines that start with
spaces before the digit 3 and have a tab after.
(This is the way GNU uniq -c
makes its output lines, at least.) In the example below, the space is
marked by Ë; the TAB is marked by
tab:
grep Section 13.1
sort names | uniq -c | grep "^Ë*3tab"
The lines don't have to be sorted; they simply have
to be adjacent. For example, if you have a log file where the last
few fields are repeated, you can have uniq
"watch" those fields and tell you
how many times they were repeated. Here we'll skip
the first four fields and get a count of how many times the rest of
each line was repeated:
$ cat log
Nov 21 17:20:19 powerd: down 2 volts
Nov 21 17:20:27 powerd: down 2 volts
Nov 21 17:21:15 powerd: down 2 volts
Nov 21 17:22:48 powerd: down 2 volts
Nov 21 18:18:02 powerd: up 3 volts
Nov 21 19:55:03 powerd: down 2 volts
Nov 21 19:58:41 powerd: down 2 volts
$ uniq -4 -c log
4 Nov 21 17:20:19 powerd: down 2 volts
1 Nov 21 18:18:02 powerd: up 3 volts
2 Nov 21 19:55:03 powerd: down 2 volts
--JP and DG
 |  |  | | 21.19. Joining Lines with join |  | 21.21. Rotating Text |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|