15.4 Advanced Sorting
Earlier, you learned that you could take a list and sort it in ascending ASCII order (like you do strings) using the built-in
To define a sort of a different color, you need to define a comparison routine that describes how two elements compare. Why is this necessary? Well, if you think about it, sorting is putting a bunch of things in order by comparing them all. Since you can't compare them all at once, you need to compare two at a time, eventually using what you find out about each pair's order to put the whole kit'n'caboodle in line. The comparison routine is defined as an ordinary subroutine. This routine will be called repeatedly, each time passing two elements of the list to be sorted. The routine must determine whether the first value is less-than, equal-to, or greater-than the second value, and return a coded value (described in a moment). This process is repeated until the list is sorted.
To save a little execution speed, the two values are not passed in an array, but rather are handed to the subroutine as the values of the global variables
Here's an example of a comparison routine that sorts in numeric order: sub by_number { if ($a < $b) { return -1; } elsif ($a == $b) { return 0; } elsif ($a > $b) { return 1; } }
Notice the name
Let's look through this routine. If the value of How do we use it? Let's try sorting the following list: @somelist = (1,2,4,8,16,32,64,128,256);
If we use the ordinary @wronglist = sort @somelist; # @wronglist is now (1,128,16,2,256,32,4,64,8)
Certainly not very numeric. Well, let's give @rightlist = sort by_number @wronglist; # @rightlist is now (1,2,4,8,16,32,64,128,256)
This does the trick. Note that you can read the
This kind of three-way value of sub by_number { $a <=> $b; }
Note the spaceship between the two variables. Yes, it is indeed a three-character-long operator. The spaceship returns the same values as the @rightlist = sort { $a <=> $b } @wronglist; There are some who argue that this decreases readability. They are wrong. Others argue that it removes the need to go somewhere else for the definition. Perl doesn't care. Our personal rule is that if it doesn't fit on one line or we have to use it more than once, it goes into a subroutine.
The spaceship operator for numeric comparison has a comparable string operator called @result = sort { $a cmp $b } @somelist;
You probably won't ever write this exact subroutine (mimicking the built-in default sort), unless you're writing a book about Perl. However, the sub by_mostly_numeric { ($a <=> $b) || ($a cmp $b); }
How does this work? Well, if the result of the spaceship is -1 or 1, the rest of the expression is skipped, and the -1 or 1 is returned. If the spaceship evaluates to zero, however, the The values being compared are not necessarily the values being passed in. For example, say you have a hash where the keys are the login names and the values are the real names of each user. Suppose you want to print a chart where the login names and real names are sorted in the order of the real names. How would you do that?
Actually, it's fairly easy. Let's assume the values are in the array @sortedkeys = sort by_names keys(%names); sub by_names { return $names{$a} cmp $names{$b}; } foreach (@sortedkeys) { print "$_ has a real name of $names{$_}\n"; }
To this we should also add a fallback comparison. Suppose the real names of two users are identical. Because of the whimsical nature of the sub by_names { ($names{$a} cmp $names{$b}) || ($a cmp $b); } Here, if the real names are the same, we sort based on the login name instead. Since the login name is guaranteed to be unique (after all, they are the keys of this hash, and no two keys are the same), then we can ensure a unique and repeatable order. Good defensive programming during the day is better than a late-night call from a system administrator wondering why the security alarms are going off. |
|