home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 36.1 Putting Things in Order Chapter 36
Sorting
Next: 36.3 Changing the Field Delimiter
 

36.2 Sort Fields: How sort Sorts

Unless you tell it otherwise, sort divides each line into fields at white space (blanks or tabs), and sorts the lines, by field, from left to right.

That is, it sorts on the basis of field 0 (leftmost); but when the leftmost fields are the same, it sorts on the basis of field 1; and so on. This is hard to put into words, but it's really just common sense. Suppose your office inventory manager created a file like this:

supplies     pencils  148
furniture    chairs   40
kitchen      knives   22
kitchen      forks    20
supplies     pens     236
furniture    couches  10
furniture    tables   7
supplies     paper    29

You'd want all the supplies sorted into categories and within each category, you'd want them sorted alphabetically:

% 

sort supplies


furniture    chairs   40
furniture    couches  10
furniture    tables   7
kitchen      forks    20
kitchen      knives   22
supplies     paper    29
supplies     pencils  148
supplies     pens     236

Of course, you don't always want to sort from left to right. The command line option +n tells sort to start sorting on field n ; -n tells sort to stop sorting on field n . Remember (again) that sort counts fields from left to right, starting with 0. [1] Here's an example. We want to sort a list of telephone numbers of authors, presidents, and blues singers:

[1] I harp on this because I always get confused and have to look it up in the manual page.

Robert M Johnson      344-0909
Lyndon B Johnson      933-1423
Samuel H Johnson      754-2542
Michael K Loukides    112-2535
Jerry O Peek          267-2345
Timothy F O'Reilly    443-2434

According to standard "telephone book rules," we want these names sorted by last name, first name, and middle initial. We don't want the phone number to play a part in the sorting. So we want to start sorting on field 2, stop sorting on field 3, continue sorting on field 0, sort on field 1, and (just to make sure) stop sorting on field 2 (the last name). We can code this as follows:

% 

sort +2 -3 +0 -2 phonelist


Lyndon B Johnson      933-1423
Robert M Johnson      344-0909
Samuel H Johnson      754-2542
Michael K Loukides    112-2535
Timothy F O'Reilly    443-2434
Jerry O Peek          267-2345

A few notes:

  • We need the -3 option to prevent sort from sorting on the telephone number after sorting on the last name. Without -3 , the "Robert Johnson" entry would appear before "Lyndon Johnson" because it has a lower phone number.

  • We don't need to state +1 explicitly. Unless you give an explicit "stop" field, +1 is implied after +0 .

  • If two names are completely identical, we probably don't care what happens next. However, just to be sure that something unexpected doesn't take place, we end the option list with -2 , which says, "After sorting on the middle initial, don't do any further sorting."

There are a couple of variations that are worth mentioning. You may never need them unless you're really serious about sorting data files, but it's good to keep them in the back of your mind. First, you can add any "collation" operations (discard blanks, numeric sort, etc.) to the end of a field specifier to describe how you want that field sorted. Using our previous example, let's say that if two names are identical, you want them sorted in numeric phone number order. The following command does the trick:

% 
sort +2 -3 +0 -2 +3n phonelist

The +3n option says "do a numeric sort on the fourth field." If you're worried about initial blanks (perhaps some of the phone numbers have area codes), use +3nb .

Second, you can specify individual columns within any field for sorting, using the notation +n.c , where n is a field number, and c is a character position within the field. Likewise, the notation -n.c says "stop sorting at the character before character c ." If you're counting characters, be sure to use the -b (ignore white space) option - otherwise, it will be very difficult to figure out what character you're counting.

- ML


Previous: 36.1 Putting Things in Order UNIX Power Tools Next: 36.3 Changing the Field Delimiter
36.1 Putting Things in Order Book Index 36.3 Changing the Field Delimiter

The UNIX CD Bookshelf Navigation The UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System