15.4 Advanced Sorting
Earlier, you learned that you could take a list and sort it in ascending ASCII order (as you do with strings) using the built-in
To define a sort of a different color, you need to define a comparison routine that describes how two elements compare. Why is this necessary? Well, if you think about it, sorting is putting a bunch of things in order by comparing them all. Because you can't compare them all at once, you need to compare two at a time, eventually using what you find out about each pair's order to put the whole kit'n'caboodle in line. The comparison routine is defined as an ordinary subroutine. This routine will be called repeatedly, each time passing two elements of the list to be sorted. The routine must determine whether the first value is less-than, equal-to, or greater-than the second value, and return a coded value (described in a moment). This process is repeated until the list is sorted.
To save a little execution speed, the two values are not passed in an array, but rather are handed to the subroutine as the values of the global variables
Here's an example of a sort subroutine that sorts in numeric order: sub by_number { if ($a < $b) { return -1; } elsif ($a == $b) { return 0; } elsif ($a > $b) { return 1; } }
Notice the name
Let's look through this routine. If the value of How do we use it? Let's try sorting the following list: @somelist = (1,2,4,8,16,32,64,128,256);
If we use the ordinary @wronglist = sort @somelist; # @wronglist is now (1,128,16,2,256,32,4,64,8)
This @rightlist = sort by_number @wronglist; # @rightlist is now (1,2,4,8,16,32,64,128,256)
This
This kind of three-way value of -1, 0, and +1 on the basis of a numeric comparison occurs often enough in sort routines that Perl has a special operator to do this in one fell swoop. The operation is often called the
spaceship
operator, and looks like sub by_number { $a <=> $b; }
Note the spaceship between the two variables. Yes, it is indeed a three-character-long operator. The spaceship returns the same values as the @rightlist = sort { $a <=> $b } @wronglist; There are some who argue that this shortcut decreases readability. They are wrong. Others argue that it removes the need to go somewhere else for the definition. Perl doesn't care. Our personal rule is that if it doesn't fit on one line or we have to use it more than once, it goes into a subroutine.
The spaceship operator for numeric comparison has a comparable string operator called
@result = sort { $a cmp $b } @somelist;
You probably won't ever write this exact subroutine (mimicking the built-in default sort), unless you're writing a book about Perl. However, the sub by_mostly_numeric { ($a <=> $b) || ($a cmp $b); }
How does this work? Well, if the result of the spaceship is -1 or 1, the rest of the expression is skipped, and the -1 or 1 is returned. If the spaceship evaluates to zero, however, the The values being compared are not necessarily the values being passed in. For example, say you have a hash where the keys are the login names and the values are the real names of each user. Suppose you want to print a chart where the login names and real names are sorted in the order of the real names. How would you do that?
Actually, the solution is fairly easy. Let's assume the values are in the array @sortedkeys = sort by_names keys(%names); sub by_names { return $names{$a} cmp $names{$b}; } foreach (@sortedkeys) { print "$_ has a real name of $names{$_}\n"; }
To this we should also add a fallback comparison. Suppose the real names of two users are identical. Because of the whimsical nature of the sub by_names { ($names{$a} cmp $names{$b}) || ($a cmp $b); } Here, if the real names are the same, we sort based on the login name instead. Because the login name is guaranteed to be unique (after all, login names are the keys of this hash, and no two keys are the same), then we can ensure a unique and repeatable order. Good defensive programming during the day is better than a late-night call from a system administrator wondering why the security alarms are going off. |
|