|
Chapter 6 Global Replacement
|
|
In making global replacements, UNIX editors such as
vi
allow you to
search not just for fixed strings of characters,
but also for variable patterns of words, referred to as
regular
expressions
.
When you specify a literal string of characters, the search
might turn up other occurrences that you didn't want to match.
The problem with searching for words in a file is that a word
can be used in different ways.
Regular expressions
help you conduct a search for words in context.
Note that regular expressions can be used with the
vi
search
commands
/
and
?
as well as in the
ex
:g
and
:s
commands. For the most part, the same regular
expressions work with other UNIX programs such as
grep
,
sed
, and
awk
.
Regular expressions are made up by combining normal characters with a number
of special characters called
metacharacters
.
The meta\%characters and their uses are listed below.
-
.
-
Matches any
single
character except a newline (carriage
return).
Remember that spaces are treated as characters.
For example,
p.p
matches character strings such as
pep
,
pip
,
pcp
.
-
*
-
Matches any number (or none) of the single character
that immediately precedes it. For example,
bugs*
will
match
bugs
(one
s
) or
bug
(no
s
's).
The character preceding the * can be one that is
specified by a regular expression.
For example, since . (dot) means any character,
.*
means "match any number of any character."
Here's a specific
example of this. The command
:s/End.*/End/
removes
all characters after
End
(it replaces the remainder of the
line with nothing).
-
^
-
Requires that the following regular expression be found at the beginning of
the line; for example,
^Part
matches
Part
when it occurs at the beginning of a line, and
^...
matches the first three characters of a line.
-
$
-
Requires that the preceding regular expression be found at the end
of the line; for example,
here:$
.
-
\
-
Treats the following special character as an ordinary character.
For example,
\.
matches an actual period instead of "any single
character," and
\*
matches an actual asterisk instead of
"any number of a character." The \ (backslash)
prevents the interpretation of a special character.
This prevention is called "escaping the character."
-
[ ]
-
Matches any
one
of the characters enclosed between the brackets.
For example,
[AB]
matches either
A
or
B
,
and
p[aeiou]t
matches
pat
,
pet
,
pit
,
pot
, or
put
.
A range of consecutive characters can be specified by separating
the first and last characters in the range with a hyphen.
For example,
[A-Z]
will match any uppercase
letter from
A
to
Z
, and
[0-9]
will match any
digit from
0
to
9
.
You can include more than one
range inside brackets, and you can specify a mix of ranges and
separate characters. For example, [
:;A-Za-z()
]
will match four different punctuation marks, plus all letters.
Most metacharacters lose their special meaning inside brackets,
so you don't need to escape them if you want to use them as
ordinary characters. Within brackets, the three metacharacters
you still need to escape
are
\
-
]
. (The hyphen (
-
)
acquires meaning as a range specifier; to use an actual hyphen,
you can also place it as the the first character inside the
brackets.)
A caret (
^
) has special meaning only when it is the
first character inside the brackets, but in this case the meaning
differs from that of the normal
^
metacharacter.
As the first character within brackets, a
^
reverses their sense: the brackets
will match any one character
not
in the list. For example,
[^a-z]
matches any character that is not a lowercase letter.
-
\( \)
-
Saves the pattern enclosed between
\(
and
\)
into a special holding space or "hold buffer."
Up to nine patterns can be saved in this way on a single line.
For example, the pattern:
\(That\) or \(this\)
saves
That
in hold buffer number 1 and
saves
this
in hold buffer number 2.
The patterns held can be "replayed" in substitutions by the sequences
\1
to
\9
.
For example, to rephrase
That or this
to read
this or That
, you could enter:
:%s/\(That\) or \(this\)/\2 or \1/
-
\< \>
-
Matches characters at the beginning (
\<
) or at the end
(
\>
) of a word.
The end or beginning of a
word is determined either by a punctuation mark or by a space.
For example, the expression
\<ac
will match only words
that begin with
ac
, such as
action
.
The expression
ac\>
will match only words
that end with
ac
, such as
maniac
.
Neither expression will match
react
.
-
~
-
Matches whatever regular expression was used in the
last
search. For example, if you searched for
The
,
you could search for
Then
with
/~n
.
Note that you can use this pattern only in a regular search
(with
/
). It
won't work as the pattern in a substitute command. It does,
however, have a similar
meaning in the replacement portion of a substitute command.
When you make global replacements, the regular expressions above
carry their special meaning only within the search portion
(the first part) of the command. For example, when you type this:
:%s/1\. Start/2. Next, start with $100/
note that the replacement string
understands the characters
.
and
$
, without your
having to escape them.
By the same token, let's say you enter:
:%s/[ABC]/[abc]/g
If you're hoping to replace
A
with
a
,
B
with
b
, and
C
with
c
,
you're in for a surprise. Since brackets behave like
ordinary characters in a replacement string, this command
will change every occurrence of
A
,
B
, or
C
to the
five-character string
[abc]
.
To solve problems like this,
you need a way to specify variable
replacement strings. Fortunately, there are additional regular
expressions that have special meaning in a
replacement
string.
-
\n
-
Matches the
n
th pattern previously saved by \( and \), where
n
is a number from 1 to 9, and previously saved patterns are counted
from the left on the line. See the explanation for
\( and \) in the previous section.
-
\
-
Treats the following special character as an ordinary character.
Backslashes are metacharacters in replacement strings
as well as in search patterns.
To specify a real backslash, type two in a row (\\).
-
&
-
Prints the entire search pattern when used in a replacement
string. This is useful when you want to avoid retyping text:
:%s/Yazstremski/&, Carl/
The replacement will say
Yazstremski, Carl
. The
&
can
also replace a variable pattern (as specified by a regular
expression). For example, to surround each line from 1 to 10 with
parentheses, type:
:1,10s/.*/(&)/
The search pattern matches the whole line, and the
&
"replays" the line, followed by your text.
-
~
-
Has a similar meaning as when it is used in a search pattern;
the string found is replaced with the replacement
text specified in the last substitute command. This is useful for
repeating an edit. For example, you could say
:s/thier/their/
on
one line and repeat the change on another with
:s/thier/~/
.
The search pattern doesn't need to be the same, though. For
example, you could say
:s/his/their/
on
one line and repeat the replacement on another with
:s/her/~/
.
-
\u
or
\l
-
Causes the next character in the replacement string to be changed to
uppercase or lowercase, respectively. For example, to change
yes, doctor
into
Yes, Doctor
, you could say:
:%s/yes, doctor/\uyes, \udoctor/
This is a pointless example, though, since it's easier
just to type the replacement string with initial caps in the
first place. As with any regular expression,
\u
and
\l
are most useful with a variable string. Take, for
example, the command we used earlier:
:%s/\(That\) or \(this\)/\2 or \1/
The result is
this or That
, but we need to adjust the
cases. We'll use
\u
to uppercase the first letter in
this
(currently saved in hold buffer 2);
we'll use
\l
to lowercase the first letter in
That
(currently saved in hold buffer 1):
:s/\(That\) or \(this\)/\u\2 or \l\1/
The result is
This or that
. (Don't confuse the number one
with the lowercase
l
; the one comes after.)
-
\U
or
\L
-
Similar to
\u
or
\l
, but all following characters are
converted to uppercase or lowercase until the end of the
replacement string or until
\e
or
\E
is reached.
If there is no
\e
or
\E
, all characters of the
replacement text are affected by the
\U
or
\L
.
For example, to uppercase
Fortran
, you could say:
:%s/Fortran/\UFortran/
or, using the
&
character to repeat the search string:
:%s/Fortran/\U&/
All pattern searches are case-sensitive. That is, a search for
the
will
not find
The
. You can get around this by specifying both
uppercase and lowercase in the pattern:
/[Tt]he
You can also instruct
vi
to ignore case by typing
:set
ic
.
See
Chapter 7, Advanced Editing
,
for additional details.
|