$ cat nameState
s/ CA/, California/
s/ MA/, Massachusetts/
s/ OK/, Oklahoma/
s/ PA/, Pennsylvania/
s/ VA/, Virginia/
Of course, you'd want to handle all states, not just five, and if you
were running it on documents other than mailing lists, you should make
sure that it does not make unwanted replacements.
The output for this program, using the input file
list, is the same as we have already seen. In
the next example, the output produced by nameState
is piped to an awk program that extracts the name of the state from
each record.
$ sed -f nameState list | awk -F, '{ print $4 }'
Massachusetts
Virginia
Oklahoma
Pennsylvania
Massachusetts
Virginia
California
Massachusetts
The awk program is processing the output produced by the sed script.
Remember that the sed script replaces the abbreviation with a comma
and the full name of the state. In effect, it splits the third field
containing the city and state into two fields. "$4" references the
fourth field.
What we are doing here could be done completely in sed, but probably
with more difficulty and less generality. Also, since awk allows you
to replace the string you match, you could achieve this result
entirely with an awk script.
While the result of this program is not very useful, it could
be passed to sort | uniq -c, which would sort the states
into an alphabetical list with a count of the number of occurrences
of each state.
Now we are
going to do something more interesting. We want to produce a report
that sorts the names by state and lists the name of the state followed
by the name of each person residing in that state. The following
example shows the byState program.
#! /bin/sh
awk -F, '{
print $4 ", " $0
}' $* |
sort |
awk -F, '
$1 == LastState {
print "\t" $2
}
$1 != LastState {
LastState = $1
print $1
print "\t" $2
}'
This shell script has three parts. The program invokes awk to produce
input for the sort program and then invokes awk
again to test the sorted input and determine if the name of the state
in the current record is the same as in the previous record. Let's see
the script in action:
$ sed -f nameState list | byState
California
Amy Wilde
Massachusetts
Eric Adams
John Daggett
Sal Carpenter
Oklahoma
Orville Thomas
Pennsylvania
Terry Kalkas
Virginia
Alice Ford
Hubert Sims
The names are sorted by state. This is a typical example of using
awk to generate a report from structured data.
To examine how the byState program works, let's
look at each part separately. It's designed to read input
from the nameState program and expects "$4" to be
the name of the state. Look at the output produced by the first
line of the program:
$ sed -f nameState list | awk -F, '{ print $4 ", " $0 }'
Massachusetts, John Daggett, 341 King Road, Plymouth, Massachusetts
Virginia, Alice Ford, 22 East Broadway, Richmond, Virginia
Oklahoma, Orville Thomas, 11345 Oak Bridge Road, Tulsa, Oklahoma
Pennsylvania, Terry Kalkas, 402 Lans Road, Beaver Falls, Pennsylvania
Massachusetts, Eric Adams, 20 Post Road, Sudbury, Massachusetts
Virginia, Hubert Sims, 328A Brook Road, Roanoke, Virginia
California, Amy Wilde, 334 Bayshore Pkwy, Mountain View, California
Massachusetts, Sal Carpenter, 73 6th Street, Boston, Massachusetts
In this chapter, we have covered the basic operations of sed and awk.
We have looked at important command-line options and introduced you to
scripting. In the next chapter, we are going to look at regular
expressions, something both programs use to match patterns in the
input.