Language Summary for awk (sed & awk, Second Edition)

This section summarizes how awk processes input records and describes the various syntactic elements that make up an awk program.

Special

Characters

Usage

Matches any literal character c that is not a metacharacter.

Escapes any metacharacter that follows, including itself.

Anchors following regular expression to the beginning of string.

Anchors preceding regular expression to the end of string.

Matches any single character, including newline.

[...]

Matches any one of the class of characters enclosed between the brackets. A circumflex (^) as the first character inside brackets reverses the match to all characters except those listed in the class. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in a class is a member of the class. All other metacharacters lose their meaning when specified as members of a class, except \, which can be used to escape ], even if it is not first.

r1|r2

Between two regular expressions, r1 and r2, it allows either of the regular expressions to be matched.

(r1)(r2)

Used for concatenating regular expressions.

Matches any number (including zero) of the regular expression that immediately precedes it.

Matches one or more occurrences of the preceding regular expression.

Matches 0 or 1 occurrences of the preceding regular expression.

(r)

Used for grouping regular expressions.

Notation

Facility

[.symbol.]

Collating symbols. A collating symbol is a multi-character sequence that should be treated as a unit.

[=equiv=]

Equivalence classes. An equivalence class lists a set of characters that should be considered equivalent, such as "e" and "è".

[:class:]

Character classes. Character class keywords describe different classes of characters such as alphabetic characters, control characters, and so on.

[:alnum:]

Alphanumeric characters

[:alpha:]

Alphabetic characters

[:blank:]

Space and tab characters

[:cntrl:]

Control characters

[:digit:]

Numeric characters

[:graph:]

Printable and visible (non-space) characters

[:lower:]

Lowercase characters

[:print:]

Printable characters

[:punct:]

Punctuation characters

[:space:]

Whitespace characters

[:upper:]

Uppercase characters

[:xdigit:]

Hexadecimal digits

B.2.5. Expressions

An expression can be made up of constants, variables, operators and functions. A constant is a string (any sequence of characters) or a numeric value. A variable is a symbol that references a value. You can think of it as a piece of information that retrieves a particular numeric or string value.

B.2.5.1. Constants

There are two types of constants, string and numeric. A string constant must be quoted while a numeric constant is not.

B.2.5.2. Escape sequences

The escape sequences described in Table B.3 can be used in strings and regular expressions.

Table B.3. Escape Sequences

Sequence	Description
\a	Alert character, usually ASCII BEL character
\b	Backspace
\f	Formfeed
\n	Newline
\r	Carriage return
\t	Horizontal tab
\v	Vertical tab
\ddd	Character represented as 1 to 3 digit octal value
\xhex	Character represented as hexadecimal value[91]
\c	Any literal character c (e.g., \" for ")[92]

[91]POSIX does not provide "\x", but it is commonly available.

[92]Like ANSI C, POSIX leaves it purposely undefined what you get when you put a backslash before any character not listed in the table. In most awks, you just get that character.

B.2.5.3. Variables

There are three kinds of variables: user-defined, built-in, and fields. By convention, the names of built-in or system variables consist of all capital letters.

The name of a variable cannot start with a digit. Otherwise, it consists of letters, digits, and underscores. Case is significant in variable names.

A variable does not need to be declared or initialized. A variable can contain either a string or numeric value. An uninitialized variable has the empty string ("") as its string value and 0 as its numeric value. Awk attempts to decide whether a value should be processed as a string or a number depending upon the operation.

The assignment of a variable has the form:

var = expr

It assigns the value of the expression to var. The following expression assigns a value of 1 to the variable x.

x = 1

The name of the variable is used to reference the value:

{ print x }

prints the value of the variable x. In this case, it would be 1.

See the later Section 2.2.5.5 for information on built-in variables. A field variable is referenced using $n, where n is any number 0 to NF, that references the field by position. It can be supplied by a variable, such as $NF meaning the last field, or constant, such as $1 meaning the first field.

B.2.5.4. Arrays

An array is a variable that can be used to store a set of values. The following statement assigns a value to an element of an array:

array[index] = value

In awk, all arrays are associative arrays. What makes an associative array unique is that its index can be a string or a number.

An associative array makes an "association" between the indices and the elements of an array. For each element of the array, a pair of values is maintained: the index of the element and the value of the element. The elements are not stored in any particular order as in a conventional array.

You can use the special for loop to read all the elements of an associative array.

for (item in array)

The index of the array is available as item, while the value of an element of the array can be referenced as array[item].

You can use the operator in to test that an element exists by testing to see if its index exists.

if (index in array)

tests that array[index] exists, but you cannot use it to test the value of the element referenced by array[index].

You can also delete individual elements of the array using the delete statement.

B.2.5.5. System variables

Awk defines a number of special variables that can be referenced or reset inside a program, as shown in Table B.4 (defaults are listed in parentheses).

Table B.4. Awk System Variables

Variable	Description
ARGC	Number of arguments on command line
ARGV	An array containing the command-line arguments
CONVFMT	String conversion format for numbers (%.6g). (POSIX)
ENVIRON	An associative array of environment variables
FILENAME	Current filename
FNR	Like NR, but relative to the current file
FS	Field separator (a blank)
NF	Number of fields in current record
NR	Number of the current record
OFMT	Output format for numbers (%.6g)
OFS	Output field separator (a blank)
ORS	Output record separator (a newline)
RLENGTH	Length of the string matched by match() function
RS	Record separator (a newline)
RSTART	First position in the string matched by match() function
SUBSEP	Separator character for array subscripts (\034)

B.2.5.6. Operators

Table B.5 lists the operators in the order of precedence (low to high) that are available in awk.

Table B.5. Operators

Operators	Description
= += -= = /= %= ^= *=	Assignment
?:	C conditional expression
\|\|	Logical OR
&&	Logical AND
~ !~	Match regular expression and negation
< <= > >= != ==	Relational operators
(blank)	Concatenation
+ -	Addition, subtraction
* / %	Multiplication, division, and modulus
+ - !	Unary plus and minus, and logical negation
^ **	Exponentiation
++ --	Increment and decrement, either prefix or postfix
$	Field reference

NOTE: While "**" and "**=" are common extensions, they are not part of POSIX awk.

B.2. Language Summary for awk

B.2.1. Records and Fields

B.2.2. Format of a Script

B.2.2.1. Line termination

B.2.2.2. Comments

B.2.3. Patterns

B.2.4. Regular Expressions

Table B.1. Regular Expression Metacharacters

Table B.2. POSIX Character List Facilities