13.2. Patterns and Procedures
gawk scripts consist of patterns and
procedures:
pattern {procedure}
Both are optional. If pattern is missing,
{procedure} is applied to all records. If {procedure} is missing, the matched record is printed.
By default, each line of input is a record, but you can specify a
different record separator through the RS variable.
13.2.1. Patterns
A
pattern can be any of the following:
/regular expression/
relational expression
pattern-matching expression
pattern,pattern
BEGIN
END
Some rules regarding patterns include:
-
Expressions can be composed of quoted strings, numbers, operators,
functions, defined variables, or any of the predefined variables
described later under "gawk System
Variables."
-
Regular expressions use the extended set of metacharacters and are
described in Chapter 9.
-
In addition, ^ and $ can be used
to refer to the beginning and end of a field, respectively, rather
than the beginning and end of a record.
-
Relational expressions use the relational operators listed under
"Operators" later in this chapter.
Comparisons can be either string or numeric. For example, $2 > $1 selects lines for which the second
field is greater than the first.
-
Pattern-matching expressions use the operators ~ (match) and
!~
(don't match). See
"Operators" later in this chapter.
-
The BEGIN pattern lets you specify procedures
that take place before the first input record is processed.
(Generally, you set global variables here.)
-
The END pattern lets you specify procedures
that take place after the last input record is read.
-
If there are multiple BEGIN or
END patterns, their associated
actions are taken in the order in which they appear in the script.
-
pattern,pattern specifies a
range of lines. This syntax cannot include BEGIN or END
as a pattern.
Except for BEGIN and END, patterns can be combined with the
Boolean operators || (OR), && (AND), and ! (NOT).
In addition to other regular-expression operators, GNU gawk supports POSIX character lists, which are
useful for matching non-ASCII characters in languages other than
English. These lists are recognized only within [ ] ranges. A typical use is [[:lower:]], which in English is the same as
[a-z]. See Chapter 9 for a complete list of POSIX character lists.
13.2.2. Procedures
Procedures
consist of one or more commands, functions, or variable assignments,
separated by newlines or semicolons and contained within curly
braces. Commands fall into four groups:
13.2.3. Simple Pattern/Procedure Examples
-
Print first field of each line (no pattern specified):
{ print $1 }
-
Print all lines that contain
"Linux":
/Linux/
-
Print first field of lines that contain
"Linux":
/Linux/{ print $1 }
-
Print records containing more than two fields:
NF > 2
-
Interpret each group of lines up to a blank line as a single input
record:
BEGIN { FS = "\n"; RS = "" }
-
Print fields 2 and 3 in switched order, but only on lines whose first
field matches the string "URGENT":
$1 ~ /URGENT/ { print $3, $2 }
-
Count and print the number of instances of
"ERR" found:
/ERR/ { ++x }; END { print x }
-
Add numbers in second column and print total:
{total += $2 }; END { print "column total is", total}
-
Print lines that contain fewer than 20 characters:
length( ) < 20
-
Print each line that begins with
"Name:" and that contains exactly
seven fields:
NF = = 7 && /^Name:/
-
Reverse the order of fields:
{ for (i = NF; i >= 1; i--) print $i }
| | | 13. The gawk Scripting Language | | 13.3. gawk System Variables |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|
|