Using Metacharacters in Regular Expressions (Unix Power Tools, 3rd Edition)

32.4. Using Metacharacters in Regular Expressions

Summary Box

There are three important parts to a regular expression:

Anchors

Specify the position of the pattern in relation to a line of text.

Character sets

Match one or more characters in a single position.

Modifiers

Specify how many times the previous character set is repeated.

The following regular expression demonstrates all three parts:

^#*

The caret (^) is an anchor that indicates the beginning of the line. The hash mark is a simple character set that matches the single character #. The asterisk (*) is a modifier. In a regular expression, it specifies that the previous character set can appear any number of times, including zero. As you will see shortly, this is a useless regular expression (except for demonstrating the syntax!).

There are two main types of regular expressions: simple (also known as basic) regular expressions and extended regular expressions. (As we'll see in the next dozen articles, the boundaries between the two types have become blurred as regular expressions have evolved.) A few utilities like awk and egrep use the extended regular expression. Most use the simple regular expression. From now on, if I talk about a "regular expression" (without specifying simple or extended), I am describing a feature common to both types. For the most part, though, when using modern tools, you'll find that extended regular expressions are the rule rather than the exception; it all depends on who wrote the version of the tool you're using and when, and whether it made sense to worry about supporting extended regular expressions.

[The situation is complicated by the fact that simple regular expressions have evolved over time, so there are versions of "simple regular expressions" that support extensions missing from extended regular expressions! Bruce explains the incompatibility at the end of Section 32.15. -- TOR]

The next eleven articles cover metacharacters and regular expressions:

The anchor characters ^ and $ (Section 32.5)

Matching a character with a character set (Section 32.6)

Match any character with . (dot) (Section 32.7)

Specifying a range of characters with [...] (Section 32.8)

Exceptions in a character set (Section 32.9)

Repeating character sets with * (Section 32.10)

Matching a specific number of sets with \{ and \} (Section 32.11)

Matching words with \< and \> (Section 32.12)

Remembering patterns with $, $, and \1 (Section 32.13)

Potential problems (Section 32.14)

Extended regular expressions (Section 32.15)

-- BB