|
Chapter 6 Global Replacement
|
|
Unless you are already familiar with regular expressions,
the discussion of special characters above probably looks forbiddingly
complex.
A few more examples should make things clearer.
In the examples that follow, a square
()
is used to mark a blank space; it is not a special character.
Let's work through how you might use some special characters in a replacement.
Suppose that you have a long file
and that you want to substitute the word
child
with the
word
children
throughout that file.
You first save the edited buffer with
:w
, then
try the global replacement:
:%s/child/children/g
When you continue editing, you notice occurrences of words
such as
childrenish
.
You have unintentionally matched the word
childish
.
Returning to the last saved buffer with
:e!
, you now try:
:%s/child/children/g
(Note that there is a space after
child
.)
But this command misses
the occurrences
child.
,
child,
,
child:
and so on.
After some thought, you remember that brackets
allow you to specify one character from among a list, so
you come upon the solution:
:%s/child[,.;:!?]/children[,.;:!?]/g
This searches for
child
followed by either a
space (indicated by ) or any one of
the punctuation characters
,.;:!?
. You expect to
replace this with
children
followed by the corresponding
space or punctuation mark,
but you've ended up with
a bunch of punctuation marks after every occurrence of
children
.
You need to save the
space and punctuation marks inside a
\(
and
\)
.
Then you can "replay"
them with a
\1
. Here's the next attempt:
:%s/child\([,.;:!?]\)/children\1/g
When the search matches a character inside the
\(
and
\)
,
the
\1
on the right-hand side restores the same character.
The syntax may seem awfully complicated, but this
command sequence can save you a lot of work!
Any time you spend learning regular expression syntax will be repaid
a thousandfold!
The command is still not perfect, though. You've noticed
that occurrences of
Fairchild
have been changed,
so you need a way to match
child
when it isn't part of another
word.
As it turns out,
vi
(but not all other programs that use regular
expressions) has a special syntax for saying "only if the pattern
is a complete word." The character sequence
\<
requires the pattern
to match at the beginning of a word, whereas
\>
requires the
pattern to match at the end of a word. Using both will restrict
the match to a whole word. So, in the task given above,
\<child\>
will find all instances of the word
child
,
whether followed by punctuation or spaces.
Here's the substitution command you should use:
:%s/\<child\>/children/g
Suppose you have subroutine names beginning with the
prefixes:
mgi
,
mgr
and
mga
.
If you want to save the prefixes but want to change the name
box
to
square
, either of the following replacement commands will do the trick.
The first example illustrates how
\(
and
\)
can be used to save whatever
pattern was actually matched. The second example shows how you can search
for one pattern but change another.
:g/mg\([ira]\)box/s//mg\1square/g
:g/mg[ira]box/s/box/square/g
You can also move blocks of text delimited by patterns.
For example, assume you have a 150-page reference manual.
All references pages are organized into three paragraphs with
the same three headings: SYNTAX, DESCRIPTION,
and PARAMETERS.
A sample of one reference page follows:
.Rh 0 "Get status of named file" "STAT"
.Rh "SYNTAX"
.nf
integer*4 stat, retval
integer*4 status(11)
character*123 filename
...
retval = stat (filename, status)
.fi
.Rh "DESCRIPTION"
Writes the fields of a system data structure into the
status array.
These fields contain (among other
things) information about the file's location, access
privileges, owner, and time of last modification.
.Rh "PARAMETERS"
.IP "\fBfilename\fR" 15n
A character string variable or constant containing
the UNIX pathname for the file whose status you want
to retrieve.
You can give the ...
Suppose that it is decided to move DESCRIPTION above the SYNTAX
paragraph.
With pattern matching, you can move blocks of text on all 150 pages
with one command!
:g /SYNTAX/,/DESCRIPTION/-1 mo /PARAMETERS/-1
This command operates on the block of
text between the line containing the word
SYNTAX
and the line just
before the word
DESCRIPTION
(
/DESCRIPTION/-1
).
The block is moved (using
mo
)
to the line just before
PARAMETERS
(
/PARAMETERS/-1
).
Note that
ex
can place text only below the line specified.
To tell
ex
to place text above a line, you first have to
move up a line with
-1
, and then place your text below.
In a case like this, one command saves literally hours of work.
(This is a real-life example - we once used a pattern match like this
to rearrange a reference manual containing hundreds of pages.)
Block definition by patterns can be used equally well with other
ex
commands.
For example, if you wanted to delete all DESCRIPTION paragraphs in
the reference chapter, you could enter:
:g/DESCRIPTION/,/PARAMETERS/-1d
This very powerful kind of change is implicit in
ex's
line
addressing syntax, but it is not readily apparent even to experienced
users.
For this reason,
whenever you are faced with a complex, repetitive editing task, take
the time to analyze the problem
and find out if you can apply
pattern-matching tools to get the job done.
Since the best way to learn pattern matching is by example,
here is a list of pattern-matching examples, with explanations.
Study the syntax carefully, so that you understand the principles at
work.
You should then be able to adapt these examples to your own situation.
-
Put
troff
italicization codes around
the word
RETURN
:
:%s/RETURN/\\fIRETURN\\fP/g
Notice that two backslashes (
\\
) are needed in the replacement,
because the backslash in the
troff
italicization code will be
interpreted as a special character.
(
\fI
alone would be interpreted as
fI
;
you must type
\\fI
to get
\fI
.)
-
Modify a list of pathnames in a file:
:%s/\/usr\/tim/\/usr\/linda/g
A slash (used as a delimiter in the global replacement sequence) must be
escaped with a backslash when it is part of the pattern or
replacement; use
\/
to get /.
An alternate way to achieve this same
effect is to use a different character as the pattern delimiter.
For example, you could make the above replacement using colons as
delimiters. Thus:
:%s:/usr/tim:/usr/linda:g
-
Change all periods to semicolons in lines 1 to 10:
:1,10s/\./;/g
A dot has special meaning in regular expression syntax and must
be escaped with a backslash (\.).
-
Change all occurrences of the word
help
(or
Help
) to
HELP
:
:%s/[Hh]elp/HELP/g
or:
:%s/[Hh]elp/\U&/g
The
\U
changes the pattern that follows to all uppercase. The
pattern that follows is the repeated search pattern, which is
either
help
or
Help
.
-
Replace
one or more
spaces with a single space:
:%s/*//g
Make sure you understand how the asterisk works as a special
character.
An asterisk following any character (or following any regular
expression that matches a single character, such as
.
or
[a-z]
)
matches
zero or more
instances of that character.
Therefore, you must specify
two
spaces followed by an asterisk
to match one or more spaces (one space, plus zero or more spaces).
-
Replace one or more spaces following a colon with two spaces:
:%s/:*/:/g
-
Replace one or more spaces following a period
or
a
colon with two spaces:
:%s/\([:.]\)*/\1/g
Either of the two characters within brackets can be matched.
This character is saved into a hold buffer, using
\(
and
\)
, and restored on the right-hand side by the
\1
.
Note that within brackets a special character such as a dot
does not need to be escaped.
-
Standardize various uses of a word or heading:
:%s/^Note[:s]*/Notes:/g
The brackets enclose three characters: a space,
a colon, and the letter
s
.
Therefore, the pattern
Note[s:]
will match
Note
,
Notes
or
Note:
.
An asterisk is added to the pattern so that it also matches
Note
(with zero spaces after it) and
Notes:
(the
already correct spelling). Without the asterisk,
Note
would be missed entirely and
Notes:
would be
incorrectly changed to
Notes::
.
-
Delete all blank lines:
:g/^$/d
What you are actually matching here is the beginning of the line (^)
followed by the end of the line ($), with nothing in between.
-
Delete all blank lines, plus any lines that contain only white space:
:g/^[
tab
]*$/d
(In the line above, a tab is shown as
tab
.)
A line may appear to be blank but may in fact contain spaces or tabs.
The previous example will not delete such a line.
This example, like the one above it, searches for the beginning and end
of the line. But instead of having nothing in between, the
pattern tries to find any number of spaces or tabs.
If no spaces or tabs are matched, the line is blank.
To delete lines that contain white space but that
aren't
blank,
you would have to match lines with
at least
one space or tab:
:g/^[
tab
][
tab
]*$/d
-
Delete all leading spaces on a line:
:%s/^*\(.*\)/\1/
Use
^*
to search for one or more spaces at the beginning of a line;
then use
\(.*\)
to save the rest of the line into the first hold buffer.
Restore the line without spaces, using
\1
.
-
Delete all spaces at the end of a line:
:%s/\(.*\)*$/\1/
Use
\(.*\)
to save all the text on the line,
but only up until one or more spaces
at the end of the line.
Restore the saved text without the spaces.
The substitutions in this example and the previous one
will happen only once on any given line, so the
g
option
doesn't need to follow the replacement string.
-
Insert a >
at the start of every line in a file:
:%s/^/>/
What we're really doing here is "replacing" the start of the line with
>
Of course, the start of the line (being a logical construct, not an
actual character) isn't really replaced!
This command is useful when replying to mail or USENET news postings.
Frequently, it is desirable to include part of the
original message in your reply. By convention,
the inclusion is distinguished from your reply
by setting off the included text with a right angle
bracket and a couple of spaces at the start of the line. This can be done
easily as shown above. (Typically, only part of the original message will
be included. Unneeded text can be deleted either before or after the above
replacement.) Advanced mail systems do this automatically.
However, if you're using a primitive mail program,
you may need to do it manually.
-
Add a period to the end of the next six lines:
:.,+5s/$/./
The line address indicates the current line plus five lines.
The
$
indicates the end of line. As in the previous
example, the
$
is a logical construct. You aren't
really replacing the end of the line.
-
Reverse the order of all hyphen-separated items in a list:
:%s/\(.*\)-\(.*\)/\2-\1/
Use
\(.*\)
to save text on the line into the first hold buffer, but
only until you find
-.
Then use
\(.*\)
to save the rest of the line into the
second hold buffer.
Restore the saved portions of the line,
reversing the order of the two hold buffers.
The effect of this command on several items is shown below.
more - display files
becomes:
display files - more
and:
lp - print files
becomes:
print files - lp
-
Change every word in a file to uppercase:
:%s/.*/\U&/
or:
:%s/./\U&/g
The
\U
flag at the start of the replacement string tells
vi
to change the replacement to uppercase. The
&
character replays the search pattern as the replacement.
These two commands are equivalent; however, the first form is
considerably faster, since it results in only one substitution per
line (
.*
matches the entire line, once per line),
whereas the second form results in repeated substitutions on each line
(
.
matches only a
single character, with the replacement repeated on account of the
trailing
g
).
-
Reverse the order of lines in a file:
[1]
:g/.*/mo0
The search pattern matches all lines (a line contains zero or more
characters).
Each line is moved, one by one, to the top of the file (that
is, moved after imaginary line 0). As each matched line is
placed at the top, it pushes the previously moved lines down,
one by one, until the last line is on top.
Since all lines have a beginning, the same result can be achieved
more succinctly:
:g/^/mo0
-
In a database, on all lines not marked
Paid in full
,
append the phrase
Overdue
:
:g!/Paidinfull/s/$/Overdue/
or the equivalent:
:v/Paidinfull/s/$/Overdue/
To affect all lines
except
those matching your pattern,
add a
!
to the
g
command,
or simply use the
v
command.
-
For any line that doesn't begin with a number, move the line
to the end of the file:
:g!/^[1-9]/m$
or:
:g/^[^1-9]/m$
As the first character within brackets, a caret negates the
sense, so the two commands have the same effect. The first one
says, "Don't match lines that begin with a number," and the
second one says, "Match lines that don't begin with a number."
-
Change manually numbered section heads
(e.g., 1.1, 1.2, etc.) to a
troff
macro
(e.g.,
.Ah
for an A-level heading):
:%s/[1-9]\.[1-9]/.Ah/
The search string matches a digit other than zero, followed by a
period, followed by another nonzero digit.
Notice that the period doesn't need to be escaped in the replacement
(though a
\
would have no effect, either).
The command above won't find chapter numbers containing
two or more digits. To do so, modify the command like this:
:%s/[1-9][0-9]*\.[1-9]/.Ah/
Now it will match chapters
10 to 99 (digits 1 to 9, followed by a digit),
100 to 999 (digits 1 to 9, followed by two digits),
etc.
The command still finds chapters
1 to 9 (digits 1 to 9, followed by no digit).
-
Remove numbering from section headings in a document.
You want to change the sample lines:
2.1 Introduction
10.3.8 New Functions
into the lines:
Introduction
New Functions
Here's the command to do this:
:%s/^[1-9][0-9]*\.[1-9][1-9.]*//
The search pattern resembles the one in the previous example, but now
the numbers vary in length. At a minimum, the headings contain
number
,
period
,
number
, so you start
with the search pattern from the previous example:
[1-9][0-9]*\.[1-9]
But in this example, the heading may continue with any number of
digits or periods:
[1-9.]*
-
Change the word
Fortran
to the phrase
FORTRAN (acronym of FORmula
TRANslation)
:
:%s/\(For\)\(tran\)/\U\1\2\E(acronymof\U\1\Emula\U\2\Eslation)/g
First, since we notice that
the words
FORmula
and
TRANslation
use portions of the
original word, we decide to save the search pattern in two
pieces:
\(For\)
and
\(tran\)
.
The first time we restore it,
we use both pieces together, converting all characters to uppercase:
\U\1\2
. Next, we undo the uppercase with
\E
;
otherwise the
remaining replacement text would all be uppercase. The
replacement continues with actual typed words,
then we restore the first hold buffer. This buffer still contains
For
, so again we convert to uppercase first:
\U\1
.
Immediately after, we lowercase the rest of the word:
\Emula
.
Finally, we restore the second hold buffer. This contains
tran
, so we precede the "replay" with uppercase, follow it
with lowercase, and type out the rest of the word:
\U\2\Eslation
).
We conclude this chapter by presenting sample tasks that involve
complex pattern-matching concepts.
Rather than solve the problems right away, we'll work toward
the solutions step by step.
Suppose you have a few lines with this general form:
the best of times; the worst of times: moving
The coolest of times; the worst of times: moving
The lines that you're concerned with always end
with
moving
, but you never know what the first two words
might be. You want to change
any line that ends with
moving
to read:
The greatest of times; the worst of times: moving
Since the changes must occur on certain lines, you need to
specify a context-sensitive global replacement. Using
:g/moving$/
will match lines that end with
moving
.
Next, you realize that your search pattern could be any number of
any character, so the metacharacters
.*
come to mind.
But these will match the whole line unless you somehow restrict
the match. Here's your first attempt:
:g/moving$/s/.*of/Thegreatestof/
This search string, you decide, will match from the beginning of
the line to the first
of
. Since you needed to specify the
word
of
to restrict the search, you simply repeat it in the
replacement. Here's the resulting line:
The greatest of times: moving
Something went wrong. The replacement gobbled the line up to the second
of
instead of the first. Here's why. When given a
choice, the action of "match any number of any character" will
match as much text as possible.
In this case, since the word
of
appears twice,
your search string finds:
the best of times; the worst of
rather than:
the best of
Your search pattern needs to be more restrictive:
:g/moving$/s/.*of times;/The greatest of times;/
Now the
.*
will match all characters up to
the instance of the phrase
of times;
.
Since there's only one instance, it has to be the first.
There are cases, though, when it is
inconvenient, or even incorrect, to use the
.*
metacharacters.
For example, you might find yourself typing
many words to restrict your search pattern, or you might
be unable to restrict the pattern by specific words (if the text
in the lines varies widely). The next section presents such a
case.
Suppose you want to switch the order of all last names
and first names in a database.
The lines look like this:
Name: Feld, Ray; Areas: PC, UNIX; Phone: 123-4567
Name: Joy, Susan S.; Areas: Graphics; Phone: 999-3333
The name of each field ends with a colon, and each field is
separated by a semicolon. Using the top line as an example, you
want to change
Feld, Ray
to
Ray Feld
.
We'll present some commands that look promising but don't work.
After each command, we show you the line the way it looked before
the change and after the change.
:%s/: \(.*\), \(.*\);/: \2 \1;/
Name:
Feld, Ray; Areas: PC
,
UNIX
; Phone: 123-4567
Before
Name:
UNIX
Feld, Ray; Areas: PC
; Phone: 123-4567
After
We've highlighted the contents of the first hold buffer in
bold
and the contents of the second hold buffer in
italic
.
Note that the first hold buffer contains more than you want.
Since it was not sufficiently restricted by the pattern that
follows it, the hold buffer was able to save up to the second comma.
Now you try to restrict the contents of the first hold buffer:
:%s/: \(....\), \(.*\);/: \2 \1;/
Name:
Feld
,
Ray; Areas: PC, UNIX
; Phone: 123-4567
Before
Name:
Ray; Areas: PC, UNIX
Feld
; Phone: 123-4567
After
Here you've managed to save the last name in the first hold
buffer, but now the second hold buffer will save anything
up to the last semicolon on the line. Now you restrict the
second hold buffer, too:
:%s/: \(....\), \(...\);/: \2 \1;/
Name:
Feld
,
Ray
; Areas: PC, UNIX; Phone: 123-4567
Before
Name:
Ray
Feld
; Areas: PC, UNIX; Phone: 123-4567
After
This gives you what you want, but only in the specific case of a
four-letter last name and a three-letter first name. (The
previous attempt included the same mistake.) Why not just return
to the first attempt, but this time be more selective about the
end of the search pattern?
:%s/: \(.*\), \(.*\); Area/: \2 \1;/
Name:
Feld
,
Ray
; Areas: PC, UNIX; Phone: 123-4567
Before
Name:
Ray
Feld
; Areas: PC, UNIX; Phone: 123-4567
After
This works, but we'll continue the discussion by
introducing an additional concern. Suppose that the
Area
field isn't always present or isn't always the second field.
The above command won't work on such lines.
We introduce this problem to make a point. Whenever you rethink
a pattern match, it's usually better to work toward refining the
variables (the metacharacters), rather than using specific text
to restrict pattterns. The more variables you use in your
patterns,
the more powerful your commands will be.
In the current example,
think again about the patterns you want to switch.
Each word starts with an uppercase letter and is followed by any
number of lowercase letters, so you can match the names like this:
[A-Z][a-z]*
Ok, but a last name might also have more than one uppercase letter
(
McFly
, for example),
so you'd want to search for this possibility in the second and
succeeding letters:
[A-Z][A-Za-z]*
It doesn't hurt to use this for the first name, too (you never
know when
McGeorge Bundy
will turn up).
Your command now becomes:
:%s/: \([A-Z][A-Za-z]*\), \([A-Z][A-Za-z]*\);/: \2 \1;/
Quite forbidding, isn't it?
It still doesn't cover the case of a name like
Joy, Susan S.
Since the first-name field might include a middle initial, you need to
add a space and a period within the second pair of brackets.
But enough is enough.
Sometimes, specifying exactly what
you want is more difficult than specifying what you
don't
want. In your sample database, the last names end with a comma,
so a last-name field can be thought of as a string of characters
that are
not
commas:
[^,]*
This pattern matches characters up until the first comma.
Similarly, the first-name field is a string of characters that
are
not
semicolons:
[^;]*
Putting these more efficient patterns back into your previous
command, you get this:
:%s/: \([^,]*\), \([^;]*\);/: \2 \1;/
The same command could also be entered as a
context-sensitive replacement.
If all lines begin with
Name
, you can say:
:g/^Name/s/: \([^,]*\), \([^;]*\);/: \2 \1;/
You can also add an asterisk after the first space,
in order to match a colon that has extra spaces (or no spaces)
after it:
:g/^Name/s/: *\([^,]*\), \([^;]*\);/: \2 \1;/
As we've usually seen the
:g
command used, it selects lines that
are typically then edited by subsequent commands on the same line - for
example, we select lines with
g
, and then make substitutions
on them, or select them and delete them:
:g/mg[ira]box/s/box/square/g
:g/^$/d
However, in his two-part tutorial in
UNIX World
,
[2]
Walter Zintz makes an interesting point about the
g
command. This command selects lines - but the associated editing
commands need not actually affect the lines that are selected.
Instead, he demonstrates a technique by which you can repeat
ex
commands
some arbitrary number of times. For example, suppose you want to place
ten copies of lines 12 through 17 of your file at the end of your
current file. You could type:
:1,10g/^/ 12,17t$
This is a very unexpected use of
g
, but it works! The
g
command
selects line 1, executes the specified
t
command, then goes on to
line 2, to execute the next copy command. When line 10 is
reached,
ex
will have made ten copies.
Here's another advanced
g
example, again building on
suggestions provided in Zintz's article.
Suppose you're editing a document that consists of several parts.
Part 2 of this file is shown below, using ellipses to
show omitted text and displaying line numbers for reference.
301 Part 2
302 Capability Reference
303 .LP
304 Chapter 7
305 Introduction to the Capabilities
306 This and the next three chapters ...
400 ... and a complete index at the end.
401 .LP
402 Chapter 8
403 Screen Dimensions
404 Before you can do anything useful
405 on the screen, you need to know ...
555 .LP
556 Chapter 9
557 Editing the Screen
558 This chapter discusses ...
821 .LP
822 Part 3:
823 Advanced Features
824 .LP
825 Chapter 10
The chapter numbers appear on one line, their
titles appear on the line below, and the chapter text
(highlighted for emphasis)
begins on the line below that.
The first thing you'd like to do is copy the beginning line
of each chapter, sending it to an already existing file called
begin
.
Here's the command that does this:
:g /^Chapter/ .+2w >> begin
You must be at the top of your file before issuing this command.
First you search for
Chapter
at the start of a line,
but then you want to run the command on the beginning line of each
chapter - the second line below
Chapter
.
Because a line beginning with
Chapter
is now selected as
the current line,
the line address
.+2
will indicate the second line below it.
The equivalent line
addresses
+2
or
++
work as well.
You want to write these lines to an existing file named
begin
, so you issue the
w
command with the append operator
>>
.
Suppose you want to send the beginnings of chapters that are only
within Part 2. You need to restrict the lines selected by
g
,
so you change your command to this:
:/^Part 2/,/^Part 3/g /^Chapter/ .+2w >> begin
Here, the
g
command selects the lines that begin with
Chapter
, but it searches
only that portion of the file from a line
starting with
Part 2
through a line
starting with
Part 3
.
If you issue the above command,
the last lines of the file
begin
will read as follows:
This and the next three chapters ...
Before you can do anything useful
This chapter discusses ...
These are the lines that begin Chapters 7, 8, and 9.
In addition to the lines you've just sent,
you'd like to copy chapter titles to the end of
the document, in preparation for making a table of contents.
You can use the vertical bar to tack a second command after
your first command, like so:
:/^Part 2/,/^Part 3/g /^Chapter/ .+2w >> begin | +t$
Remember that with any subsequent command, line addresses are
relative to the previous command. The first command has marked
lines
(within Part 2) that start with
Chapter
, and the chapter titles
appear on a line below such lines. Therefore,
to access chapter titles in the second command, the line
address is
+
(or the equivalents
+1
or
.+1
).
Then use
t$
to copy the chapter titles to the end of the
file.
As these examples illustrate, thought and
experimentation may lead you to some unusual editing solutions.
Don't be afraid to try things! Just be sure to back up your file
first.
|