34.11. Referencing Portions of a Search String
In sed, the substitution command provides metacharacters to select any individual portion of a string that is matched and recall it in the replacement string. A pair of escaped parentheses are used in sed to enclose any part of a regular expression and save it for recall. Up to nine "saves" are permitted for a single line. \n is used to recall the portion of the match that was saved, where n is a number from 1 to 9 referencing a particular "saved" string in order of use. (Section 32.13 has more information.)
For example, when converting a plain-text document into HTML, we could convert section numbers that appear in a cross-reference into an HTML hyperlink. The following expression is broken onto two lines for printing, but you should type all of it on one line:
s/\([sS]ee \)\(Section \)\([1-9][0-9]*\)\.\([1-9][0-9]*\)/ \1<a href="#SEC-\3_\4">\2\3.\4<\/a>/
Four pairs of escaped parentheses are specified. String 1 captures the word see with an upper- or lowercase s. String 2 captures the section number (because this is a fixed string, it could have been simply retyped in the replacement string). String 3 captures the part of the section number before the decimal point, and String 4 captures the part of the section number after the decimal point. The replacement string recalls the first saved substring as \1. Next starts a link where the two parts of the section number, \3 and \4, are separated by an underscore (_) and have the string SEC- before them. Finally, the link text replays the section number again -- this time with a decimal point between its parts. Note that although a dot (.) is special in the search pattern and has to be quoted with a backslash there, it's not special on the replacement side and can be typed literally. Here's the script run on a short test document, using checksed (Section 34.4):
% checksed testdoc ********** < = testdoc > = sed output ********** 8c8 < See Section 1.2 for details. --- > See <a href="#SEC-1_2">Section 1.2</a> for details. 19c19 < Be sure to see Section 23.16! --- > Be sure to see <a href="#SEC-23_16">Section 23.16</a>!
We can use a similar technique to match parts of a line and swap them. For instance, let's say there are two parts of a line separated by a colon. We can match each part, putting them within escaped parentheses and swapping them in the replacement:
% cat test1 first:second one:two % sed 's/\(.*\):\(.*\)/\2:\1/' test1 second:first two:one
The larger point is that you can recall a saved substring in any order and multiple times. If you find that you need more than nine saved matches, or would like to be able to group them into matches and submatches, take a look at Perl.
--DD and JP
Copyright © 2003 O'Reilly & Associates. All rights reserved.