4.6. Extracting Unique Elements from a ListProblemYou want to eliminate duplicate values from a list, such as when you build the list from a file or from the output of another command. This recipe is equally applicable to removing duplicates as they occur in input and to removing duplicates from an array you've already populated. Solution
Use a hash to record which items have been seen, then Straightforward%seen = (); @uniq = (); foreach $item (@list) { unless ($seen{$item}) { # if we get here, we have not seen it before $seen{$item} = 1; push(@uniq, $item); } } Faster%seen = (); foreach $item (@list) { push(@uniq, $item) unless $seen{$item}++; } Similar but with user function%seen = (); foreach $item (@list) { some_func($item) unless $seen{$item}++; } Faster but different%seen = (); foreach $item (@list) { $seen{$item}++; } @uniq = keys %seen; Faster and even more different%seen = (); @uniqu = grep { ! $seen{$_} ++ } @list; DiscussionThe question at the heart of the matter is "Have I seen this element before?" Hashes are ideally suited to such lookups. The first technique ( "Straightforward ") builds up the array of unique values as we go along, using a hash to record whether something is already in the array.
The second technique (
"Faster
") is the most natural way to write this sort of thing in Perl. It creates a new entry in the hash every time it sees an element that hasn't been seen before, using the The third example ( "Similar but with user function ") is similar to the second but rather than storing the item away, we call some user-defined function with that item as its argument. If that's all we're doing, keeping a spare array of those unique values is unnecessary.
The next mechanism (
"Faster but different
") waits until it's done processing the list to extract the unique keys from the
The final approach, (
"Faster and even more different
") merges the construction of the
Using a hash to record the values has two side effects: processing long lists can take a lot of memory and the list returned by
Here's an example of processing input as it is read. We use # generate a list of users logged in, removing duplicates %ucnt = (); for (`who`) { s/\s.*\n//; # kill from first space till end-of-line, yielding username $ucnt{$_}++; # record the presence of this user } # extract and print unique keys @users = sort keys %ucnt; print "users logged in: @users\n"; See Also
The "Foreach Loops" section of
perlsyn
(1) and
Chapter 2
of
Programming Perl
; the Copyright © 2002 O'Reilly & Associates. All rights reserved. |
|