As we were writing this book, I decided to make a list of all the
articles, the numbers of lines and characters in each - then combine
that with the description, a status code, and the article's title.
After a few minutes with
wc -l -c
(
29.6
)
,
cut
(
35.14
)
,
sort
(
36.1
)
,
and
join
(
35.19
)
,
I had a file that looked like this:
%
cat messfile
2850 2095 51441 ~BB A sed tutorial
3120 868 21259 +BB mail - lots of basics
6480 732 31034 + How to find sources - JIK's periodic posting
...900 lines...
5630 14 453 +JP Running Commands on Directory Stacks
1600 12 420 !JP With find, Don't Forget -print
0495 9 399 + Make 'xargs -i' use more than one filename
Yuck. It was tough to read. The columns needed to be straightened.
A little
awk
(
33.11
)
script
turned the mess into this:
%
cat cleanfile
2850 2095 51441 ~BB A sed tutorial
3120 868 21259 +BB mail - lots of basics
6480 732 31034 + How to find sources - JIK's periodic posting
...900 lines...
5630 14 453 +JP Running Commands on Directory Stacks
1600 12 420 !JP With find, Don't Forget -print
0495 9 399 + Make 'xargs -i' use more than one filename
Here's the simple script I used and the command I typed to run it:
%
cat neatcols
{
printf "%4s %4s %6s %-4s %s\n", \
$1, $2, $3, $4, substr($0, index($0,$5))
}
%
awk -f neatcols messfile > cleanfile
You can adapt that script for whatever kinds of columns you need to
clean up.
In case you don't know
awk
, here's a quick summary:
-
The first line of the
printf
, between double quotes (
"
),
tells the field widths and alignments.
For example, the first column should be right-aligned in 4 characters
(
%4s
).
The fourth column should be 4 characters wide left-adjusted (
%-4s
).
The fifth column is big enough to just fit (
%s
).
I used string (
%s
) instead of decimal (
%d
) so
awk
wouldn't strip off the leading zeros in the columns.
-
The second line arranges the input data fields onto the output line.
Here, input and output are in the same order, but I could have reordered them.
The first four columns get the first four fields (
$1, $2, $3, $4
).
The fifth column is a catch-all; it gets everything else.
substr($0, index($0,$5))
means "find the fifth input column; print it and everything after it."