34.11. Referencing Portions of a Search String
In sed, the
substitution command
provides metacharacters to select any individual portion of a string
that is matched and recall it in the replacement string. A pair of
escaped parentheses are used in sed to enclose any
part of a regular expression and save it for recall. Up to nine
"saves" are permitted for a single
line. \n is used to
recall the portion of the match that was saved, where
n is a number from 1 to 9 referencing a
particular "saved" string in order
of use. (Section 32.13 has more information.)
For example, when converting a
plain-text document into HTML, we could convert section numbers that
appear in a cross-reference into an HTML hyperlink. The following
expression is broken onto two lines for printing, but you should type
all of it on one line:
s/\([sS]ee \)\(Section \)\([1-9][0-9]*\)\.\([1-9][0-9]*\)/
\1<a href="#SEC-\3_\4">\2\3.\4<\/a>/
Four pairs of escaped parentheses are specified. String 1 captures
the word see with an upper- or lowercase
s. String 2 captures the section number (because
this is a fixed string, it could have been simply retyped in the
replacement string). String 3 captures the part of the section number
before the decimal point, and String 4 captures the part of the
section number after the decimal point. The replacement string
recalls the first saved substring as \1. Next
starts a link where the two parts of the section number,
\3 and \4, are separated by an
underscore (_) and have the string
SEC- before them. Finally, the link text replays
the section number again -- this time with a decimal point between
its parts. Note that although a dot (.) is special in the search
pattern and has to be quoted with a backslash there,
it's not special on the replacement side and can be
typed literally. Here's the script run on a short
test document, using checksed (Section 34.4):
% checksed testdoc
********** < = testdoc > = sed output **********
8c8
< See Section 1.2 for details.
---
> See <a href="#SEC-1_2">Section 1.2</a> for details.
19c19
< Be sure to see Section 23.16!
---
> Be sure to see <a href="#SEC-23_16">Section 23.16</a>!
We can use a similar technique to match parts of a line and swap
them. For instance, let's say there are two parts of
a line separated by a colon. We can match each part, putting them
within escaped parentheses and swapping them in the replacement:
% cat test1
first:second
one:two
% sed 's/\(.*\):\(.*\)/\2:\1/' test1
second:first
two:one
The larger point is that you can recall a saved substring in any
order and multiple times. If you find that you need more than nine
saved matches, or would like to be able to group them into matches
and submatches, take a look at Perl.
Section 43.10, Section 31.10,
Section 10.9, and Section 36.23 have
examples.
--DD and JP
 |  |  | 34.10. Referencing the Search String in a Replacement |  | 34.12. Search and Replacement: One Match Among Many |
Copyright © 2003 O'Reilly & Associates. All rights reserved.
|