home | O'Reilly's CD bookshelfs | FreeBSD | Linux | Cisco | Cisco Exam  


sed & awksed & awkSearch this book

7.4. Pattern Matching

The "Hello, world" program does not demonstrate the power of pattern-matching rules. In this section, we look at a number of small, even trivial examples that nonetheless demonstrate this central feature of awk scripts.

When awk reads an input line, it attempts to match each pattern-matching rule in a script. Only the lines matching the particular pattern are the object of an action. If no action is specified, the line that matches the pattern is printed (executing the print statement is the default action). Consider the following script:

/^$/ { print "This is a blank line." }

This script reads: if the input line is blank, then print "This is a blank line." The pattern is written as a regular expression that identifies a blank line. The action, like most of those we've seen so far, contains a single print statement.

If we place this script in a file named awkscr and use an input file named test that contains three blank lines, then the following command executes the script:

$ awk -f awkscr test
This is a blank line.
This is a blank line.
This is a blank line.

(From this point on, we'll assume that our scripts are placed in a separate file and invoked using the -f command-line option.) The result tells us that there are three blank lines in test. This script ignores lines that are not blank.

Let's add several new rules to the script. This script is now going to analyze the input and classify it as an integer, a string, or a blank line.

# test for integer, string or empty line.
/[0-9]+/    { print "That is an integer" }
/[A-Za-z]+/ { print "This is a string" }
/^$/        { print "This is a blank line." }

The general idea is that if a line of input matches any of these patterns, the associated print statement will be executed. The + metacharacter is part of the extended set of regular expression metacharacters and means "one or more." Therefore, a line containing a sequence of one or more digits will be considered an integer. Here's a sample run, taking input from standard input:

$ awk -f awkscr
4
That is an integer
t
This is a string
4T
That is an integer
This is a string
RETURN
This is a blank line.
44
That is an integer
CTRL-D
$

Note that input "4T" was identified as both an integer and a string. A line can match more than one rule. You can write a stricter rule set to prevent a line from matching more than one rule. You can also write actions that are designed to skip other parts of the script.

We will be exploring the use of pattern-matching rules throughout this chapter.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.