$ awk '{ print $2, $1, $3 }' names
Robinson John 666-555-1111
$1 refers to the first name, $2 to the last name, and $3 to the phone
number. The commas that separate each argument in the
print statement cause a space to be output
between the values. (Later on, we'll discuss the output field
separator (OFS), whose value the comma
outputs and which is by default a space.) In this example, a single
input line forms one record containing three fields: there is a space
between the first and last names and a tab between the last name and
the phone number. If you wanted to grab the first and last name as a
single field, you could set the field separator explicitly so that
only tabs are recognized. Then, awk would recognize only two fields
in this record.
You can use any expression that evaluates to an integer to refer to a
field, not just numbers and variables.
$ echo a b c d | awk 'BEGIN { one = 1; two = 2 }
> { print $(one + two) }'
c
$ awk -F"\t" '{ print $2 }' names
666-555-1111
"\t" is an escape sequence (discussed below) that
represents an actual tab character. It should be surrounded by single
or double quotes.
Commas delimit fields in the following two address records.
John Robinson,Koren Inc.,978 4th Ave.,Boston,MA 01760,696-0987
Phyllis Chapman,GVE Corp.,34 Sea Drive,Amesbury,MA 01881,879-0900
An awk program can print the name and address in block format.
# blocklist.awk -- print name and address in block form.
# input file -- name, company, street, city, state and zip, phone
{ print "" # output blank line
print $1 # name
print $2 # company
print $3 # street
print $4, $5 # city, state zip
}
The first print statement specifies an empty string
("") (remember,
print by itself outputs the current line). This
arranges for the records in the report to be separated by blank lines.
We can invoke this script and specify that the field separator is a
comma using the following command:
awk -F, -f blocklist.awk names
The following report is produced:
John Robinson
Koren Inc.
978 4th Ave.
Boston MA 01760
Phyllis Chapman
GVE Corp.
34 Sea Drive
Amesbury MA 01881
BEGIN { FS = "," }
Now let's use it in a script to print out the names and phone numbers.
# phonelist.awk -- print name and phone number.
# input file -- name, company, street, city, state and zip, phone
BEGIN { FS = "," } # comma-delimited fields
{ print $1 ", " $6 }
Notice that we use blank lines in the script itself to improve
readability. The print statement puts a comma
followed by a space between the two output fields. This script can be
invoked from the command line:
$ awk -f phonelist.awk names
John Robinson, 696-0987
Phyllis Chapman, 879-0900
This gives you a basic idea of how awk can be used to work with data
that has a recognizable structure. This script is designed
to print all lines of input, but we could modify the single action by
writing a pattern-matching rule that selected only certain names or
addresses. So, if we had a large listing of names, we could select
only the names of people residing in a particular
state. We could write:
/MA/ { print $1 ", " $6 }
$5 ~ /MA/ { print $1 ", " $6 }
You can reverse the meaning of the rule by using bang-tilde (!~).
$5 !~ /MA/ { print $1 ", " $6 }
This rule would match all those records whose fifth field did not have
"MA" in it. A more challenging pattern-matching rule would be one
that matches only long-distance phone numbers. The following regular
expression looks for an area code.
$6 ~ /1?(-|
)?\(?[0-9]+\)?(
|-)?[0-9]+-[0-9]+/
This rule matches any of the following forms:
707-724-0000
(707) 724-0000
(707)724-0000
1-707-724-0000
1 707-724-0000
1(707)724-0000
The regular expression can be deciphered by breaking down its parts.
"1?" means zero or one occurrences of "1". "(-|
)?" looks for
either a hyphen or a space in the next position, or nothing at all.
"\(?" looks for zero or one left parenthesis; the backslash
prevents the interpretation of "(" as the grouping metacharacter.
"[0-9]+" looks for one or more digits; note that we took the lazy way
out and specified one or more digits rather than exactly three. In
the next position, we are looking for an optional right parenthesis,
and again, either a space or a hyphen, or nothing at all. Then we
look for one or more digits "[0-9]+" followed by a hyphen followed by
one or more digits "[0-9]+".