Summary Box
There
are three important parts to a regular expression:
- Anchors
-
Specify the position of the pattern
in relation to a line of text.
- Character sets
-
Match one or more characters in a
single position.
- Modifiers
-
Specify how many times the previous
character set is repeated.
The following regular expression demonstrates all three parts:
^#*
The caret
(^) is an anchor that indicates the beginning of
the line. The hash mark is a simple character set that matches the
single character #. The asterisk
(*) is a
modifier. In a regular expression, it specifies that the previous
character set can appear any number of times, including zero. As you
will see shortly, this is a useless regular expression (except for
demonstrating the syntax!).
There are two main types of regular expressions:
simple (also known as
basic) regular expressions and
extended regular expressions. (As
we'll see in the next dozen articles, the boundaries
between the two types have become blurred as regular expressions have
evolved.) A few utilities like
awk and egrep use the
extended regular expression. Most use the simple regular expression.
From now on, if I talk about a "regular
expression" (without specifying simple or extended),
I am describing a feature common to both types. For the most part,
though, when using modern tools, you'll find that
extended regular expressions are the rule rather than the exception;
it all depends on who wrote the version of the tool
you're using and when, and whether it made sense to
worry about supporting extended regular expressions.
[The situation is complicated by the fact that simple regular
expressions have evolved over time, so there are versions of
"simple regular expressions" that
support extensions missing from extended regular expressions! Bruce
explains the incompatibility at the end of Section 32.15. -- TOR]
The next eleven articles cover metacharacters and regular expressions:
-- BB
|