36.7 Sorting Multiline EntriesThere's one limitation to sort . It works a line at a time. If you want to sort a file with multiline entries, you're in tough shape. For example, let's say you have a list of addresses: Doe, John and Jane 30 Anywhere St Anytown, New York 10023 Buck, Jane and John 40 Anywhere St Nowheresville, Alaska 90023
# completely empty lines separate records. gawk '{ gsub(/\n/,"\1"); print $0 "\1" } ' RS= $files | sort $sortopts | tr '\1' '\12' The script starts with a lot of option processing that we don't show
here - it's incredibly thorough, and allows you to use any sort
options, except -o
.
It also adds a new -a
option, which
allows you to sort based on different lines of a multiline entry.
Say you're sorting an address file, and the street address is on the
second line of each entry.
The command The body of the script (after the option processing) is conceptually simple. It uses gawk (33.12 ) to collapse each multiline record into a single line, with the CTRL-a character to mark where the line breaks were. After this processing, a few addresses from a typical address list might look like this: Doe, John and Jane^A30 Anywhere St^AAnytown, New York^A10023^A Buck, Jane and John^A40 Anywhere St^ANowheresville, Alaska^A90023^A Now that we've converted the original file into a list of one-line entries, we have something that sort can handle. So we just use sort , with whatever options were supplied on the command line. After sorting, tr (35.11 ) "unpacks" this single-line representation, restoring the file to its original form, by converting each CTRL-a back to a newline. Notice that the gawk script added an extra CTRL-a to the end of each output line - so tr outputs an extra newline, plus the newline from the gawk print command, to give a blank line between each entry. (Thanks to Greg Ubben for this improvement.) There are lots of interesting variations on this script. You can substitute grep for the sort command, allowing you to search for multiline entries - for example, to look up addresses in an address file. This would require slightly different option processing, but the script would be essentially the same. - , |
|