home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


sed & awksed & awkSearch this book

B.2. Language Summary for awk

This section summarizes how awk processes input records and describes the various syntactic elements that make up an awk program.

B.2.2. Format of a Script

An awk script is a set of pattern-matching rules and actions:

pattern { action }

An action is one or more statements that will be performed on those input lines that match the pattern. If no pattern is specified, the action is performed for every input line. The following example uses the print statement to print each line in the input file:

{ print }

If only a pattern is specified, then the default action consists of the print statement, as shown above.

Function definitions can also appear:

function name (parameter list) { statements }

This syntax defines the function name, making available the list of parameters for processing in the body of the function. Variables specified in the parameter-list are treated as local variables within the function. All other variables are global and can be accessed outside the function. When calling a user-defined function, no space is permitted between the name of the function and the opening parenthesis. Spaces are allowed in the function's definition. User-defined functions are described in Chapter 9, "Functions".

B.2.4. Regular Expressions

Table B.1 summarizes the regular expressions as described in Chapter 3, "Understanding Regular Expression Syntax". The metacharacters are listed in order of precedence.

Table B.1. Regular Expression Metacharacters

Special
Characters Usage
c

Matches any literal character c that is not a metacharacter.

\

Escapes any metacharacter that follows, including itself.

^

Anchors following regular expression to the beginning of string.

$

Anchors preceding regular expression to the end of string.

.

Matches any single character, including newline.

[...]

Matches any one of the class of characters enclosed between the brackets. A circumflex (^) as the first character inside brackets reverses the match to all characters except those listed in the class. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in a class is a member of the class. All other metacharacters lose their meaning when specified as members of a class, except \, which can be used to escape ], even if it is not first.

r1|r2

Between two regular expressions, r1 and r2, it allows either of the regular expressions to be matched.

(r1)(r2)

Used for concatenating regular expressions.

r*

Matches any number (including zero) of the regular expression that immediately precedes it.

r+

Matches one or more occurrences of the preceding regular expression.

r?

Matches 0 or 1 occurrences of the preceding regular expression.

(r)

Used for grouping regular expressions.

Regular expressions can also make use of the escape sequences for accessing special characters, as defined in Section B.2.5.2 later in this appendix.

Note that ^ and $ work on strings; they do not match against newlines embedded in a record or string.

Within a pair of brackets, POSIX allows special notations for matching non-English characters. They are described in Table B.2.

Table B.2. POSIX Character List Facilities

Notation Facility
[.symbol.]

Collating symbols. A collating symbol is a multi-character sequence that should be treated as a unit.

[=equiv=]

Equivalence classes. An equivalence class lists a set of characters that should be considered equivalent, such as "e" and "è".

[:class:]

Character classes. Character class keywords describe different classes of characters such as alphabetic characters, control characters, and so on.

[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:]

Printable and visible (non-space) characters

[:lower:] Lowercase characters
[:print:] Printable characters
[:punct:] Punctuation characters
[:space:] Whitespace characters
[:upper:] Uppercase characters
[:xdigit:] Hexadecimal digits

Note that these facilities (as of this writing) are still not widely implemented.

B.2.5. Expressions

An expression can be made up of constants, variables, operators and functions. A constant is a string (any sequence of characters) or a numeric value. A variable is a symbol that references a value. You can think of it as a piece of information that retrieves a particular numeric or string value.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.