lensort: Sort Lines by Length (Unix Power Tools, 3rd Edition)

22.7. lensort: Sort Lines by Length

A nice little script to sort lines from shortest to longest can be handy when you're writing and want to find your big words:

deroff Section 16.9, uniq Section 21.20

% deroff -w report | uniq -d | lensort
a
an
  ...
deoxyribonucleic

Once I used it to sort a list of pathnames:

find Section 9.1

% find adir -type f -print | lensort
adir/.x
adir/.temp
   ...
adir/subdir/part1/somefile
adir/subdir/part1/a_test_case

The script uses awk (Section 20.10) to print each line's length, followed by the original line. Next, sort sorts the lengths numerically (Section 22.5). Then sed (Section 34.1) strips off the lengths and the spaces and prints the lines:

Go to http://examples.oreilly.com/upt3 for more information on: lensort

#! /bin/sh
awk 'BEGIN { FS=RS }
{ print length, $0 }' $* |
# Sort the lines numerically
sort +0n -1 |
# Remove the length and the space and print each line
sed 's/^[0-9][0-9]* //'

(Some awks require a semicolon after the first curly bracket -- that is, { FS=RS };.)

Of course, you can also tackle this problem with Perl:

$ perl -lne '$l{$_}=length;END{for(sort{$l{$a}<=>$l{$b}}keys %l){print}}' \
                filename

This one-line wonder has the side effect of eliminating duplicate lines. If this seems a bit terse, that's because it's meant to be "write-only," that is, it is a bit of shell magic that you'd use to accomplish a short-term task. If you foresee needing this same procedure in the future, it's better to capture the magic in script. Scripts also tend to be easier to understand, debug, and expand. The following script does the same thing as the one-liner but a bit more clearly:

#!/usr/bin/perl

my %lines;
while(my $curr_line = <STDIN>){
  chomp $curr_line;
  $lines{$curr_line} = length $curr_line;
}

for my $line (sort{ $lines{$a} <=> $lines{$b} } keys %lines){
  print $line, "\n";
}

This script reads in a line from standard input, removes the newline character and creates an associative array that maps whole line to its length in characters. After processing the whole file, the keys of the associative array is sorted in ascending numerical order by each key's value. It is then a simple matter to print the key, which is the line itself. More Perl tricks can be found in Chapter 11.

--JP and JJ