13.3. Patterns and Procedures
gawk scripts consist of patterns and procedures:
pattern {procedure}
Both are optional. If pattern is missing,
{procedure} is applied to all records.
If
{procedure} is missing, the matched record
is printed. By default, each line of input is a record, but you can
specify a different record separator through the RS variable.
13.3.1. Patterns
A pattern can be any of the following:
/regular expression/
relational expression
pattern-matching expression
pattern,pattern
BEGIN
END
Some rules regarding patterns include:
Expressions can be composed of quoted strings, numbers, operators,
functions, defined variables, or any of the predefined variables
described later under "gawk System Variables." Regular expressions use the extended set of metacharacters and
are described in Chapter 9, "Pattern Matching". In addition,
^ and $ can be used to refer to the beginning and end of a
field, respectively, rather than the beginning and end of a record. Relational expressions use the relational operators listed under
"Operators" later in this chapter. Comparisons can be either string or numeric. For example, $2 > $1 selects lines for
which the second field is greater than the first. Pattern-matching expressions use the operators ~ (match)
and !~ (don't match). See "Operators" later in this chapter. The BEGIN pattern lets you specify procedures that take
place before the first input record is processed. (Generally, you
set global variables here.) The END pattern lets you specify procedures that
take place after the last input record is read. If there are multiple BEGIN or END patterns,
their associated actions are taken in the order in
which they appear in the script. pattern,pattern
specifies a range of lines. This syntax cannot include
BEGIN or
END as a pattern.
Except for BEGIN and END, patterns can be combined with the
Boolean operators || (OR), && (AND), and ! (NOT).
In addition to other regular-expression operators, GNU awk
supports POSIX character lists, which are useful for matching
non-ASCII characters in languages other than English. These lists are
recognized only within [ ] ranges. A typical use would be
[[:lower:]], which in English
is the same as [a-z].
See Chapter 9, "Pattern Matching" for a complete list of POSIX character lists.
13.3.2. Procedures
Procedures consist of one or more commands, functions, or variable
assignments, separated by newlines or semicolons and contained within
curly braces. Commands fall into four groups:
13.3.3. Simple Pattern-Procedure Examples
Print first field of each line (no pattern specified):
{ print $1 }
Print all lines that contain "Linux":
/Linux/
Print first field of lines that contain "Linux":
/Linux/{ print $1 }
Print records containing more than two fields:
NF > 2
Interpret each group of lines up to a blank line as a single input record:
BEGIN { FS = "\n"; RS = "" }
Print fields 2 and 3 in switched order but only on lines whose first
field matches the string "URGENT":
$1 ~ /URGENT/ { print $3, $2 }
Count and print the number of instances of "ERR" found:
/ERR/ { ++x }; END { print x }
Add numbers in second column and print total:
{total += $2 }; END { print "column total is", total}
Print lines that contain fewer than 20 characters:
length() < 20
Print each line that begins with "Name:" and that contains exactly
seven fields:
NF == 7 && /^Name:/
Reverse the order of fields:
{ for (i = NF; i >= 1; i--) print $i }
| | | 13.2. Command-Line Syntax | | 13.4. gawk System Variables |
Copyright © 2001 O'Reilly & Associates. All rights reserved.
|
|