Straightening Jagged Columns (Unix Power Tools, 3rd Edition)

21.17. Straightening Jagged Columns

As we were writing this book, I decided to make a list of all the articles and the numbers of lines and characters in each, then combine that with the description, a status code, and the article's title. After a few minutes with wc -l -c (Section 16.6), cut (Section 21.14), sort (Section 22.1), and join (Section 21.19), I had a file that looked like this:

% cat messfile
2850 2095 51441 ~BB A sed tutorial
3120 868 21259 +BB mail - lots of basics
6480 732 31034 + How to find sources - JIK's periodic posting
    ...900 lines...
5630 14 453 +JP Running Commands on Directory Stacks
1600 12 420 !JP With find, Don't Forget -print
0495 9 399 + Make 'xargs -i' use more than one filename

Yuck. It was tough to read: the columns needed to be straightened. The column (Section 21.16) command could do it automatically, but I wanted more control over the alignment of each column. A little awk (Section 20.10) script turned the mess into this:

% cat cleanfile
2850 2095  51441 ~BB  A sed tutorial
3120  868  21259 +BB  mail - lots of basics
6480  732  31034 +    How to find sources - JIK's periodic posting
    ...900 lines...
5630   14    453 +JP  Running Commands on Directory Stacks
1600   12    420 !JP  With find, Don't Forget -print
0495    9    399 +    Make 'xargs -i' use more than one filename

Here's the simple script I used and the command I typed to run it:

% cat neatcols
{
printf "%4s %4s %6s %-4s %s\n", \
     $1, $2, $3, $4, substr($0, index($0,$5))
}
% awk -f neatcols messfile > cleanfile

You can adapt that script for whatever kinds of columns you need to clean up. In case you don't know awk, here's a quick summary:

The first line of the printf, between double quotes ("), specifies the field widths and alignments. For example, the first column should be right-aligned in 4 characters (%4s). The fourth column should be 4 characters wide left-adjusted (%-4s). The fifth column is big enough to just fit (%s). I used string (%s) instead of decimal (%d) so awk wouldn't strip off the leading zeros in the columns.
The second line arranges the input data fields onto the output line. Here, input and output are in the same order, but I could have reordered them. The first four columns get the first four fields ($1, $2, $3, $4). The fifth column is a catch-all; it gets everything else. substr($0, index($0,$5)) means "find the fifth input column; print it and everything after it."

-- JP