32.21. Pattern Matching Quick Reference with Examples
Section 32.4
gives an introduction to regular
expressions. This article is intended for those of you who need just
a quick listing of regular expression syntax as a refresher from time
to time. It also includes some simple examples. The characters in
Table 32-7 have special meaning only in
search patterns.
Table 32-7. Special characters in search patterns
Pattern
|
What does it match?
|
.
|
Match any single character except newline.
|
*
|
Match any number (including none) of the single characters that
immediately precede it. The preceding character can also be a regular
expression. For example, since . (dot) means any character,
.* means "match any number of any
character."
|
^
|
Match the following regular expression at the beginning of the line.
|
$
|
Match the preceding regular expression at the end of the line.
|
[ ]
|
Match any one of the enclosed characters.
|
|
A hyphen (-) indicates a range of consecutive
characters. A caret (^) as the first character in
the brackets reverses the sense: it matches any one character
not in the list. A hyphen or a right square
bracket (]) as the first character is treated as a
member of the list. All other metacharacters are treated as members
of the list.
|
\{n,m\}
|
Match a range of occurrences of the single character that immediately
precedes it. The preceding character can also be a regular
expression. \{n\} will match exactly
n occurrences,
\{n,\} will match at least
n occurrences, and
\{n,m\} will
match any number of occurrences between n
and m.
|
\
|
Turn off the special meaning of the character that follows (except
for \{ and \(, etc., where it turns on the special meaning of the
character that follows).
|
\( \)
|
Save the pattern enclosed between \( and \) into a special holding
space. Up to nine patterns can be saved on a single line. They can be
"replayed" in substitutions by the
escape sequences \1 to \9.
|
\< \>
|
Match characters at beginning (\<) or end
(\>) of a word.
|
+
|
Match one or more instances of preceding regular expression.
|
?
|
Match zero or one instances of preceding regular expression.
|
|
|
Match the regular expression specified before or after.
|
(' )
|
Apply a match to the enclosed group of regular expressions.
|
The characters in Table 32-8 have special meaning
only in replacement patterns.
Table 32-8. Special characters in replacement patterns
Pattern
|
What does it do?
|
\
|
Turn off the special meaning of the character that follows.
|
\n
|
Restore the nth pattern previously saved
by \( and \).
n is a number from 1 to 9, with 1 starting
on the left.
|
&
|
Reuse the string that matched the search pattern as part of the
replacement pattern.
|
\u
|
Convert first character of replacement pattern to uppercase.
|
\U
|
Convert replacement pattern to uppercase.
|
\l
|
Convert first character of replacement pattern to lowercase.
|
\L
|
Convert replacement pattern to lowercase.
|
Note that many programs, especially
perl ,
awk, and sed, implement their
own programming languages and often have much more extensive support
for regular expressions. As such, their manual pages are the best
place to look when you wish to confirm which expressions are
supported or whether the program supports more than simple regular
expressions. On many systems, notably those with a large complement
of GNU tools, the regular expression support is astonishing, and many
generations of tools may be implemented by one program (as with
grep, which also emulates the later
egrep in the same program, with widely varying
support for expression formats based on how the program is invoked).
Don't make the mistake of thinking that all of these
patterns will work everywhere in every program with regex support, or
of thinking that this is all there is.
32.21.1. Examples of Searching
When used with grep
or egrep, regular expressions are surrounded
by quotes.
(If the pattern contains a $, you must use single
quotes from the shell; e.g.,
'pattern'.)
When used with ed, ex,
sed, and awk, regular
expressions are usually surrounded by / (although
any delimiter works). Table 32-9 has some example
patterns.
Table 32-9. Search pattern examples
Pattern
|
What does it match?
|
bag
|
The string bag.
|
^bag
|
bag at beginning of line.
|
bag$
|
bag at end of line.
|
^bag$
|
bag as the only word on line.
|
[Bb]ag
|
Bag or bag.
|
b[aeiou]g
|
Second letter is a vowel.
|
b[^aeiou]g
|
Second letter is a consonant (or uppercase or symbol).
|
b.g
|
Second letter is any character.
|
^...$
|
Any line containing exactly three characters.
|
^\.
|
Any line that begins with a . (dot).
|
^\.[a-z][a-z]
|
Same, followed by two lowercase letters (e.g.,
troff requests).
|
^\.[a-z]\{2\}
|
Same as previous, grep or sed
only.
|
^[^.]
|
Any line that doesn't begin with a . (dot).
|
bugs*
|
bug, bugs,
bugss, etc.
|
"word"
|
A word in quotes.
|
"*word"*
|
A word, with or without quotes.
|
[A-Z][A-Z]*
|
One or more uppercase letters.
|
[A-Z]+
|
Same, extended regular expression format.
|
[A-Z].*
|
An uppercase letter, followed by zero or more characters.
|
[A-Z]*
|
Zero or more uppercase letters.
|
[a-zA-Z]
|
Any letter.
|
[^0-9A-Za-z]
|
Any symbol (not a letter or a number).
|
[567]
|
One of the numbers 5, 6, or
7.
|
Extended regular expression patterns:
|
five|six|seven
|
One of the words five, six, or
seven.
|
80[23]?86
|
One of the numbers 8086, 80286,
or 80386.
|
compan(y|ies)
|
One of the words company or
companies.
|
\<the
|
Words like theater or the.
|
the\>
|
Words like breathe or the.
|
\<the\>
|
The word the.
|
0\{5,\}
|
Five or more zeros in a row.
|
[0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}
|
U.S. Social Security number
(nnn-nn-nnnn).
|
32.21.2. Examples of Searching and Replacing
Table 32-10 shows the
metacharacters available to
sed or ex.
(ex commands begin with a colon.) A space is
marked by ; a TAB is marked by
tab.
Table 32-10. Search and replace commands
Command
|
Result
|
s/.*/( & )/
|
Redo the entire line, but add parentheses.
|
s/.*/mv & &.old/
|
Change a word list into mv commands.
|
/^$/d
|
Delete blank lines.
|
:g/^$/d
|
ex version of previous.
|
/^[tab]*$/d
|
Delete blank lines, plus lines containing only spaces or TABs.
|
:g/^[tab]*$/d
|
ex version of previous.
|
s/*//g
|
Turn one or more spaces into one space.
|
:%s/*//g
|
ex version of previous.
|
:s/[0-9]/Item &:/
|
Turn a number into an item label (on the current line).
|
:s
|
Repeat the substitution on the first occurrence.
|
:&
|
Same.
|
:sg
|
Same, but for all occurrences on the line.
|
:&g
|
Same.
|
:%&g
|
Repeat the substitution globally.
|
:.,$s/Fortran/\U&/g
|
Change word to uppercase, on current line to last line.
|
:%s/.*/\L&/
|
Lowercase entire file.
|
:s/\<./\u&/g
|
Uppercase first letter of each word on current line (useful for
titles).
|
:%s/yes/No/g
|
Globally change a word to No.
|
:%s/Yes/~/g
|
Globally change a different word to No (previous
replacement).
|
s/die or do/do or die/
|
Transpose words.
|
s/\([Dd]ie\) or \([Dd]o\)/\2 or
\1/
|
Transpose, using hold buffers to preserve case.
|
-- DG
| | | 32.20. Valid Metacharacters for Different Unix Programs | | 33. Wildcards |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|
|