home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


Unix Power ToolsUnix Power ToolsSearch this book

21.17. Straightening Jagged Columns

As we were writing this book, I decided to make a list of all the articles and the numbers of lines and characters in each, then combine that with the description, a status code, and the article's title. After a few minutes with wc -l -c (Section 16.6), cut (Section 21.14), sort (Section 22.1), and join (Section 21.19), I had a file that looked like this:

% cat messfile
2850 2095 51441 ~BB A sed tutorial
3120 868 21259 +BB mail - lots of basics
6480 732 31034 + How to find sources - JIK's periodic posting
    ...900 lines...
5630 14 453 +JP Running Commands on Directory Stacks
1600 12 420 !JP With find, Don't Forget -print
0495 9 399 + Make 'xargs -i' use more than one filename

Yuck. It was tough to read: the columns needed to be straightened. The column (Section 21.16) command could do it automatically, but I wanted more control over the alignment of each column. A little awk (Section 20.10) script turned the mess into this:

% cat cleanfile
2850 2095  51441 ~BB  A sed tutorial
3120  868  21259 +BB  mail - lots of basics
6480  732  31034 +    How to find sources - JIK's periodic posting
    ...900 lines...
5630   14    453 +JP  Running Commands on Directory Stacks
1600   12    420 !JP  With find, Don't Forget -print
0495    9    399 +    Make 'xargs -i' use more than one filename

Here's the simple script I used and the command I typed to run it:

% cat neatcols
{
printf "%4s %4s %6s %-4s %s\n", \
     $1, $2, $3, $4, substr($0, index($0,$5))
}
% awk -f neatcols messfile > cleanfile

You can adapt that script for whatever kinds of columns you need to clean up. In case you don't know awk, here's a quick summary:

  • The first line of the printf, between double quotes ("), specifies the field widths and alignments. For example, the first column should be right-aligned in 4 characters (%4s). The fourth column should be 4 characters wide left-adjusted (%-4s). The fifth column is big enough to just fit (%s). I used string (%s) instead of decimal (%d) so awk wouldn't strip off the leading zeros in the columns.

  • The second line arranges the input data fields onto the output line. Here, input and output are in the same order, but I could have reordered them. The first four columns get the first four fields ($1, $2, $3, $4). The fifth column is a catch-all; it gets everything else. substr($0, index($0,$5)) means "find the fifth input column; print it and everything after it."

-- JP



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.