22.7. lensort: Sort Lines by LengthA nice little script to sort lines from shortest to longest can be handy when you're writing and want to find your big words: deroff Section 16.9, uniq Section 21.20 % deroff -w report | uniq -d | lensort a an ... deoxyribonucleic Once I used it to sort a list of pathnames: find Section 9.1 % find adir -type f -print | lensort adir/.x adir/.temp ... adir/subdir/part1/somefile adir/subdir/part1/a_test_case The script uses awk (Section 20.10) to print each line's length, followed by the original line. Next, sort sorts the lengths numerically (Section 22.5). Then sed (Section 34.1) strips off the lengths and the spaces and prints the lines: Go to http://examples.oreilly.com/upt3 for more information on: lensort #! /bin/sh awk 'BEGIN { FS=RS } { print length, $0 }' $* | # Sort the lines numerically sort +0n -1 | # Remove the length and the space and print each line sed 's/^[0-9][0-9]* //' (Some awks require a semicolon after the first curly bracket -- that is, { FS=RS };.) Of course, you can also tackle this problem with Perl: $ perl -lne '$l{$_}=length;END{for(sort{$l{$a}<=>$l{$b}}keys %l){print}}' \ filename This one-line wonder has the side effect of eliminating duplicate lines. If this seems a bit terse, that's because it's meant to be "write-only," that is, it is a bit of shell magic that you'd use to accomplish a short-term task. If you foresee needing this same procedure in the future, it's better to capture the magic in script. Scripts also tend to be easier to understand, debug, and expand. The following script does the same thing as the one-liner but a bit more clearly: #!/usr/bin/perl my %lines; while(my $curr_line = <STDIN>){ chomp $curr_line; $lines{$curr_line} = length $curr_line; } for my $line (sort{ $lines{$a} <=> $lines{$b} } keys %lines){ print $line, "\n"; } This script reads in a line from standard input, removes the newline character and creates an associative array that maps whole line to its length in characters. After processing the whole file, the keys of the associative array is sorted in ascending numerical order by each key's value. It is then a simple matter to print the key, which is the line itself. More Perl tricks can be found in Chapter 11. --JP and JJ Copyright © 2003 O'Reilly & Associates. All rights reserved. |
|