This article also covers
nawk
and
gawk
(
33.12
)
.
With the exception of array subscripts, values in
[
brackets
]
are optional; don't type the
[
or
]
.
awk
can be invoked in two ways:
awk [
options
] '
script
' [
var
=
value
] [
file(s)
]
awk [
options
] -f
scriptfile
[
var
=
value
] [
file(s)
]
You can specify a
script
directly on the command line, or
you can store a script in a
scriptfile
and specify it with
-f
.
In most versions, the
-f
option can be used multiple times.
The variable
var
can be assigned a value on the command line. The value can be a
literal, a shell variable (
$
name
), or a command substitution
(
`
cmd
`
), but the value is
available only after a line of input is read (i.e., after the BEGIN
statement).
awk
operates on one or more
file(s)
.
If none are specified (or if
-
is specified),
awk
reads from the
standard input (
13.1
)
.
The other recognized
options
are:
-
-F
c
-
Set the field separator to character
c
.
This is the same as setting the system variable
FS
.
nawk
allows
c
to be a
regular expression (
26.4
)
.
Each record (by default, one input line) is divided into fields by
white space (blanks or tabs) or by some other user-definable field
separator. Fields are referred to by the variables
$1
,
$2
,...
$
n
.
$0
refers to the entire record.
For example, to
print the first three (colon-separated) fields on separate lines:
%
awk -F: '{print $1; print $2; print $3}' /etc/passwd
-
-v
var
=
value
-
Assign a
value
to variable
var
.
This allows assignment before the script begins execution.
(Available in
nawk
only.)
awk
scripts consist of patterns and procedures:
pattern
{
procedure
}
Both are optional. If
pattern
is missing,
{
procedure
}
is applied to all records.
If
{
procedure
}
is missing, the matched record
is written to the standard output.
pattern
can be any of the following:
/
regular expression
/
relational expression
pattern-matching expression
BEGIN
END
-
Expressions can be composed of quoted strings, numbers, operators,
functions, defined variables, or any of the predefined variables
described later under the section "awk System Variables."
-
Regular expressions use the extended set of metacharacters
as described in article
26.4
.
In addition,
^
and
$
can be used to refer to the beginning and end of a
field, respectively, rather than the beginning and end of a record (line).
-
Relational expressions use the relational operators listed under the section
"Operators" later in this article.
Comparisons can be either string or numeric.
For example,
$2
>
$1
selects records for
which the second field is greater than the first.
-
Pattern-matching expressions use the operators
~
(match)
and
!~
(don't match). See the section "Operators" later in this article.
-
The BEGIN pattern lets you specify procedures that will take
place
before
the first input record is processed. (Generally, you
set global variables here.)
-
The END pattern lets you specify procedures that
will take place
after
the last input record is read.
Except for BEGIN and END,
patterns can be combined with the Boolean operators
||
(OR),
&&
(AND), and
!
(NOT). A range of lines can also be
specified using comma-separated patterns:
pattern
,
pattern
procedure
can consist of one or more commands, functions, or variable
assignments, separated by newlines or semicolons (
;
), and contained within
curly braces (
{}
). Commands fall into four groups:
-
Print first field of each line:
{ print $1 }
-
Print all lines that contain
pattern
:
/pattern/
-
Print first field of lines that contain
pattern
:
/pattern/{ print $1 }
-
Print records containing more than two fields:
NF > 2
-
Interpret input records as a group of lines up to
a blank line:
BEGIN { FS = "\n"; RS = "" }
{
...process records...
}
-
Print fields 2 and 3 in switched order, but only on lines
whose first field matches the string
URGENT
:
$1 ~ /URGENT/ { print $3, $2 }
-
Count and print the number of
pattern
found:
/pattern/ { ++x }
END { print x }
-
Add numbers in second column and print total:
{total += $2 };
END { print "column total is", total}
-
Print lines that contain less than 20 characters:
length($0) < 20
-
Print each line that begins with
Name:
and that contains exactly seven fields:
NF == 7 && /^Name:/
nawk
supports all
awk
variables.
gawk
supports both
nawk
and
awk
.
The table below lists the operators,
in order of increasing precedence, that are available in
awk
:
Variables can be assigned a value with an equal sign (
=
). For example:
FS = ","
Expressions using the operators
+
,
-
,
*
,
/
, and
%
(modulo) can be assigned to variables.
Arrays can be created with the
split
function (see below),
or they can simply be named in an assignment statement.
Array elements can be subscripted with numbers
(
array
[1]
,...
array
[
n
]
)
or with names. For example, to count the number of occurrences of a
pattern, you could use the following script:
/
pattern
/ {
array
["
pattern
"]++ }
END { print
array
["
pattern
"] }
awk
commands may be classified as follows:
*Not in original
awk
The following alphabetical list of statements and functions
includes all that are available in
awk
,
nawk
, or
gawk
.
Unless otherwise mentioned, the statement or function is found
in all versions. New statements and functions introduced
with
nawk
are also found in
gawk
.
-
atan2
-
atan2(
y
,
x
)
Returns the arctangent of
y
/
x
in radians. (
nawk
)
-
break
-
Exit from a
while
,
for
, or
do
loop.
-
close
-
close(
filename-expr
)
close(
command-expr
)
In some implementations of
awk
, you can have only ten files open
simultaneously and one pipe; modern versions allow more than one pipe open.
Therefore,
nawk
provides a
close
statement that allows you to close a file or a pipe.
close
takes
as an argument the same expression that opened the pipe
or file. (
nawk
)
-
continue
-
Begin next iteration of
while
,
for
, or
do
loop immediately.
-
cos
-
cos(
x
)
Return cosine of
x
(in radians).
(
nawk
)
-
delete
-
delete
array
[
element
]
Delete
element
of
array
. (
nawk
)
-
do
-
do
body
while (
expr
)
Looping statement.
Execute statements in
body
, then evaluate
expr
.
If
expr
is true, execute
body
again.
More than one
command
must be put inside braces (
{}
).
(
nawk
)
-
exit
-
exit
[
expr
]
Do not execute remaining instructions and do not read new input.
END
procedure, if any, will be executed.
The
expr
, if any, becomes
awk
's
exit status (
44.7
)
.
-
exp
-
exp(
arg
)
Return the natural exponent of
arg
.
-
for
-
for (
[
init-expr
]
;
[
test-expr
]
;
[
incr-expr
]
)
command
C-language-style looping construct.
Typically,
init-expr
assigns the initial value of a counter
variable.
test-expr
is a relational expression that is evaluated each time
before executing the
command
.
When
test-expr
is false, the loop is exited.
incr-expr
is used to increment the counter variable after each pass.
A series of
command
s
must be put within braces (
{}
).
Example:
for (i = 1; i <= 10; i++)
printf "Element %d is %s.\n", i, array[i]
-
for
-
for (
item
in
array
)
command
For each
item
in an associative
array
, do
command
.
More than one
command
must be put inside braces (
{}
).
Refer to each element of the array as
array
[
item
]
.
-
getline
-
getline
[
var
][
<
file
]
or
command
| getline
[
var
]
Read next line of input.
Original
awk
does not support the syntax
to open multiple input streams.
The first form reads input from
file
,
and the second form reads the standard output of a UNIX
command
.
Both forms read one line at a time, and each time
the statement is executed it gets the next line
of input.
The line of input is assigned to
$0
,
and it is parsed into fields, setting
NF
,
NR
, and
FNR
.
If
var
is specified, the result is assigned
to
var
and the
$0
is not changed.
Thus, if
the result is assigned to a variable, the
current line does not change.
getline
is actually a function and it returns 1 if it
reads a record successfully, 0 if end-of-file is
encountered, and -1 if for some reason it is
otherwise unsuccessful. (
nawk
)
-
gsub
-
gsub(
r
,
s
[
,
t
]
)
Globally substitute
s
for each match of the
regular expression
r
in the string
t
.
Return
the number of substitutions.
If
t
is not supplied,
defaults to
$0
. (
nawk
)
-
if
-
if (
condition
)
command
[
else
command
]
If
condition
is true, do
command(s)
, otherwise do
command(s)
in
else
clause (if any).
condition
can be an expression that uses
any of the relational operators
<
,
<=
,
==
,
!=
,
>=
, or
>
, as well as
the pattern-matching operators
~
or
!~
(e.g.,
if ($1 ~ /[Aa].*[Zz]/)
).
A series of
command
s must be put within braces (
{}
).
-
index
-
index(
str
,
substr
)
Return position of first substring
substr
in string
str
or 0 if not found.
-
int
-
int(
arg
)
Return integer value of
arg
.
-
length
-
length(
arg
)
Return the length of
arg
.
-
log
-
log(
arg
)
Return the natural logarithm of
arg
.
-
match
-
match(
s
,
r
)
Function that matches the pattern, specified by the regular expression
r
, in the string
s
and returns either the position in
s
where the match begins or 0 if no occurrences are found.
Sets the values of
RSTART
and
RLENGTH
. (
nawk
)
-
next
-
Read next input line and start new cycle through pattern/procedures
statements.
-
print
-
print
[
args
] [
destination
]
Print
args
on output, followed by a newline.
args
is usually one or more fields,
but may also be one or more of the predefined variables - or
arbitrary expressions.
If no
args
are given, prints
$0
(the current input line).
Literal
strings must be quoted.
Fields are
printed in the order they are listed.
If separated by commas (
,
) in the
argument list, they are separated in the output by the
OFS
character.
If separated by spaces, they are
concatenated in the output.
destination
is a UNIX redirection or
pipe expression (e.g.,
>
file
) that redirects the
default standard output.
-
printf
-
format
[
,
expression(s)
] [
destination
]
Formatted print statement.
Fields or variables can be
formatted according to instructions in the
format
argument.
The number of
expression
s must correspond to the number specified in the
format sections.
format
follows the conventions of the C-language
printf
statement.
Here are a few of the most common formats:
-
%s
-
A string.
-
%d
-
A decimal number.
-
%
n
.
m
f
-
A floating-point number, where
n
is the total number of digits
and
m
is the number of digits after the decimal point.
-
%
[
-
]
nc
-
n
specifies minimum field length for format type
c
, while
-
left justifies value in field; otherwise value is right justified.
format
can also contain embedded escape sequences:
\n
(newline) or
\t
(tab)
are the most common.
destination
is a UNIX redirection or
pipe expression (e.g.,
>
file
) that redirects the
default standard output.
Example:
Using the script:
{printf "The sum on line %s is %d.\n", NR, $1+$2}
The following input line:
5 5
produces this output, followed by a newline:
The sum on line 1 is 10.
-
rand
-
rand()
Generate a random number between 0 and 1.
This function returns the
same series of numbers each time the script is executed, unless the random
number generator is seeded using the
srand( )
function. (
nawk
)
-
return
-
return
[
expr
]
Used at end of user-defined functions to exit the function,
returning value of expression
expr
, if any. (
nawk
)
-
sin
-
sin(
x
)
Return sine of
x
(in radians).
(
nawk
)
-
split
-
split(
string
,
array
[
,
sep
]
)
Split
string
into elements of
array
array[1],...
array[
n
]
.
string
is split at each occurrence of separator
sep
.
(In
nawk
, the separator may be a regular expression.)
If
sep
is
not specified,
FS
is used.
The number of array elements created is
returned.
-
sprintf
-
sprintf (
format
[
,
expression(s)
]
)
Return the value of
expression(s)
, using the specified
format
(see
printf
).
Data is formatted but not printed.
-
sqrt
-
sqrt(
arg
)
Return square root of
arg
.
-
srand
-
srand(
expr
)
Use
expr
to set a new seed for random number generator.
Default is time of day.
Returns the old seed.
(
nawk
)
-
sub
-
sub(
r
,
s
[
,
t
]
)
Substitute
s
for first match of the
regular expression
r
in the string
t
.
Return
1 if successful; 0 otherwise.
If
t
is not supplied,
defaults to
$0
. (
nawk
)
-
substr
-
substr(
string
,
m
[
,
n
]
)
Return substring of
string
beginning at character position
m
and consisting of the next
n
characters.
If
n
is
omitted, include all characters to the end of string.
-
system
-
system(
command
)
Function that executes the specified UNIX
command
and returns its
status (
44.7
)
.
The status of the command that is executed typically
indicates its success (0) or failure (non-zero).
The output of the command is not available for processing
within the
nawk
script.
Use
command
|
getline
to
read the output of the command into the script. (
nawk
)
-
tolower
-
tolower(
str
)
Translate all uppercase characters
in
str
to lowercase and return the new string. (
nawk
)
-
toupper
-
toupper(
str
)
Translate all lowercase characters
in
str
to uppercase and return the new string. (
nawk
)
-
while
-
while (
condition
)
command
Do
command
while
condition
is true (see
if
for a
description of allowable conditions).
A series of commands must be put within braces (
{}
).
-
DG
from O'Reilly & Associates'
UNIX in a Nutshell (SVR4/Solaris)
|