home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 35.21 Using IFS to Split Strings Chapter 35
You Can't Quite Call This Editing
Next: 35.23 Rotating Text
 

35.22 Straightening Jagged Columns

As we were writing this book, I decided to make a list of all the articles, the numbers of lines and characters in each - then combine that with the description, a status code, and the article's title. After a few minutes with wc -l -c (29.6 ) , cut (35.14 ) , sort (36.1 ) , and join (35.19 ) , I had a file that looked like this:

% cat messfile


2850 2095 51441 ~BB A sed tutorial
3120 868 21259 +BB mail - lots of basics
6480 732 31034 + How to find sources - JIK's periodic posting
    ...900 lines...

5630 14 453 +JP Running Commands on Directory Stacks
1600 12 420 !JP With find, Don't Forget -print
0495 9 399 + Make 'xargs -i' use more than one filename

Yuck. It was tough to read. The columns needed to be straightened. A little awk (33.11 ) script turned the mess into this:

% cat cleanfile


2850 2095  51441 ~BB  A sed tutorial
3120  868  21259 +BB  mail - lots of basics
6480  732  31034 +    How to find sources - JIK's periodic posting
    ...900 lines...

5630   14    453 +JP  Running Commands on Directory Stacks
1600   12    420 !JP  With find, Don't Forget -print
0495    9    399 +    Make 'xargs -i' use more than one filename

Here's the simple script I used and the command I typed to run it:

% cat neatcols


{
printf "%4s %4s %6s %-4s %s\n", \
     $1, $2, $3, $4, substr($0, index($0,$5))
}
% awk -f neatcols messfile > cleanfile

You can adapt that script for whatever kinds of columns you need to clean up. In case you don't know awk , here's a quick summary:

  • The first line of the printf , between double quotes (" ), tells the field widths and alignments. For example, the first column should be right-aligned in 4 characters (%4s ). The fourth column should be 4 characters wide left-adjusted (%-4s ). The fifth column is big enough to just fit (%s ). I used string (%s ) instead of decimal (%d ) so awk wouldn't strip off the leading zeros in the columns.

  • The second line arranges the input data fields onto the output line. Here, input and output are in the same order, but I could have reordered them. The first four columns get the first four fields ($1, $2, $3, $4 ).

    The fifth column is a catch-all; it gets everything else. substr($0, index($0,$5)) means "find the fifth input column; print it and everything after it."

- JP


Previous: 35.21 Using IFS to Split Strings UNIX Power Tools Next: 35.23 Rotating Text
35.21 Using IFS to Split Strings Book Index 35.23 Rotating Text

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System