[Chapter 6] 6.3 Pattern-matching Rules

6.3 Pattern-matching Rules

In making global replacements, UNIX editors such as vi allow you to search not just for fixed strings of characters, but also for variable patterns of words, referred to as regular expressions .

When you specify a literal string of characters, the search might turn up other occurrences that you didn't want to match. The problem with searching for words in a file is that a word can be used in different ways. Regular expressions help you conduct a search for words in context. Note that regular expressions can be used with the vi search commands / and ? as well as in the ex :g and :s commands. For the most part, the same regular expressions work with other UNIX programs such as grep , sed , and awk .

Regular expressions are made up by combining normal characters with a number of special characters called metacharacters . The meta\%characters and their uses are listed below.

6.3.1 Metacharacters Used in Search Patterns

.

Matches any single character except a newline (carriage return). Remember that spaces are treated as characters. For example, p.p matches character strings such as pep , pip , pcp .

*

Matches any number (or none) of the single character that immediately precedes it. For example, bugs* will match bugs (one s ) or bug (no s 's).

The character preceding the * can be one that is specified by a regular expression. For example, since . (dot) means any character, .* means "match any number of any character."

Here's a specific example of this. The command :s/End.*/End/ removes all characters after End (it replaces the remainder of the line with nothing).

^

Requires that the following regular expression be found at the beginning of the line; for example, ^Part matches Part when it occurs at the beginning of a line, and ^... matches the first three characters of a line.

$

Requires that the preceding regular expression be found at the end of the line; for example, here:$ .

\

Treats the following special character as an ordinary character. For example, \. matches an actual period instead of "any single character," and \* matches an actual asterisk instead of "any number of a character." The \ (backslash) prevents the interpretation of a special character. This prevention is called "escaping the character."

[ ]

Matches any one of the characters enclosed between the brackets. For example, [AB] matches either A or B , and p[aeiou]t matches pat , pet , pit , pot , or put . A range of consecutive characters can be specified by separating the first and last characters in the range with a hyphen. For example, [A-Z] will match any uppercase letter from A to Z , and [0-9] will match any digit from 0 to 9 .

You can include more than one range inside brackets, and you can specify a mix of ranges and separate characters. For example, [:;A-Za-z() ] will match four different punctuation marks, plus all letters.

Most metacharacters lose their special meaning inside brackets, so you don't need to escape them if you want to use them as ordinary characters. Within brackets, the three metacharacters you still need to escape are \ - ] . (The hyphen (- ) acquires meaning as a range specifier; to use an actual hyphen, you can also place it as the the first character inside the brackets.)

A caret (^ ) has special meaning only when it is the first character inside the brackets, but in this case the meaning differs from that of the normal ^ metacharacter. As the first character within brackets, a ^ reverses their sense: the brackets will match any one character not in the list. For example, [^a-z] matches any character that is not a lowercase letter.

Saves the pattern enclosed between $ and $ into a special holding space or "hold buffer." Up to nine patterns can be saved in this way on a single line. For example, the pattern:

\(That\) or \(this\)

saves That in hold buffer number 1 and saves this in hold buffer number 2. The patterns held can be "replayed" in substitutions by the sequences \1 to \9 . For example, to rephrase That or this to read this or That , you could enter:

:%s/\(That\) or \(this\)/\2 or \1/

\< \>

Matches characters at the beginning (\< ) or at the end (\> ) of a word. The end or beginning of a word is determined either by a punctuation mark or by a space. For example, the expression \<ac will match only words that begin with ac , such as action . The expression ac\> will match only words that end with ac , such as maniac . Neither expression will match react .

~

Matches whatever regular expression was used in the last search. For example, if you searched for The , you could search for Then with /~n . Note that you can use this pattern only in a regular search (with / ). It won't work as the pattern in a substitute command. It does, however, have a similar meaning in the replacement portion of a substitute command.

6.3.2 Metacharacters Used in Replacement Strings

When you make global replacements, the regular expressions above carry their special meaning only within the search portion (the first part) of the command. For example, when you type this:

:%s/1\.  Start/2.  Next, start with $100/

note that the replacement string understands the characters . and $ , without your having to escape them. By the same token, let's say you enter:

:%s/[ABC]/[abc]/g

If you're hoping to replace A with a , B with b , and C with c , you're in for a surprise. Since brackets behave like ordinary characters in a replacement string, this command will change every occurrence of A , B , or C to the five-character string [abc] .

To solve problems like this, you need a way to specify variable replacement strings. Fortunately, there are additional regular expressions that have special meaning in a replacement string.

\n

Matches the n th pattern previously saved by $ and $, where n is a number from 1 to 9, and previously saved patterns are counted from the left on the line. See the explanation for $ and $ in the previous section.

\

Treats the following special character as an ordinary character. Backslashes are metacharacters in replacement strings as well as in search patterns. To specify a real backslash, type two in a row (\\).

&

Prints the entire search pattern when used in a replacement string. This is useful when you want to avoid retyping text:

:%s/Yazstremski/&, Carl/

The replacement will say Yazstremski, Carl . The & can also replace a variable pattern (as specified by a regular expression). For example, to surround each line from 1 to 10 with parentheses, type:

:1,10s/.*/(&)/

The search pattern matches the whole line, and the & "replays" the line, followed by your text.

~

Has a similar meaning as when it is used in a search pattern; the string found is replaced with the replacement text specified in the last substitute command. This is useful for repeating an edit. For example, you could say :s/thier/their/ on one line and repeat the change on another with :s/thier/~/ . The search pattern doesn't need to be the same, though. For example, you could say :s/his/their/ on one line and repeat the replacement on another with :s/her/~/ .

\u or \l

Causes the next character in the replacement string to be changed to uppercase or lowercase, respectively. For example, to change yes, doctor into Yes, Doctor , you could say:

:%s/yes, doctor/\uyes, \udoctor/

This is a pointless example, though, since it's easier just to type the replacement string with initial caps in the first place. As with any regular expression, \u and \l are most useful with a variable string. Take, for example, the command we used earlier:

:%s/\(That\) or \(this\)/\2 or \1/

The result is this or That , but we need to adjust the cases. We'll use \u to uppercase the first letter in this (currently saved in hold buffer 2); we'll use \l to lowercase the first letter in That (currently saved in hold buffer 1):

:s/\(That\) or \(this\)/\u\2 or \l\1/

The result is This or that . (Don't confuse the number one with the lowercase l ; the one comes after.)

\U or \L

Similar to \u or \l , but all following characters are converted to uppercase or lowercase until the end of the replacement string or until \e or \E is reached. If there is no \e or \E , all characters of the replacement text are affected by the \U or \L . For example, to uppercase Fortran , you could say:

:%s/Fortran/\UFortran/

or, using the & character to repeat the search string:

:%s/Fortran/\U&/

All pattern searches are case-sensitive. That is, a search for the will not find The . You can get around this by specifying both uppercase and lowercase in the pattern:


/[Tt]he

You can also instruct vi to ignore case by typing :set ic . See Chapter 7, Advanced Editing , for additional details.