[Chapter 6] 6.4 Advanced Flow Control Commands

6.4 Advanced Flow Control Commands

You have already seen several examples of changes in sed's normal flow control. In this section, we'll look at two commands that allow you to direct which portions of the script get executed and when. The branch (b ) and test (t ) commands transfer control in a script to a line containing a specified label. If no label is specified, control passes to the end of the script. The branch command transfers control unconditionally while the test command is a conditional transfer, occurring only if a substitute command has changed the current line.

A label is any sequence of up to seven characters.[1] A label is put on a line by itself that begins with a colon:

[1] The POSIX standard says that an implementation can allow longer labels if it wishes to. GNU sed allows labels to be of any length.

:mylabel

There are no spaces permitted between the colon and the label. Spaces at the end of the line will be considered part of the label. When you specify the label in a branch or test command, a space is permitted between the command and the label itself:

b mylabel

Be sure you don't put a space after the label.

6.4.1 Branching

The branch command allows you to transfer control to another line in the script.

[address ]b [label ]

The label is optional, and if not supplied, control is transferred to the end of the script. If a label is supplied, execution resumes at the line following the label.

In Chapter 4, Writing sed Scripts , we looked at a typesetting script that transformed quotation marks and hyphens into their typesetting counterparts. If we wanted to avoid making these changes on certain lines, then we could use the branch command to skip that portion of the script. For instance, text inside computer-generated examples marked by the .ES and .EE macros should not be changed. Thus, we could write the previous script like this:

/^\.ES/,/^\.EE/b
s/^"/``/
s/"$/''/
s/"?/''?/g
.
.
.
s/\\(em\\^"/\\(em``/g
s/"\\(em/''\\(em/g
s/\\(em"/\\(em``/g
s/@DQ@/"/g

Because no label is supplied, the branch command branches to the end of the script, skipping all subsequent commands.

The branch command can be used to execute a set of commands as a procedure, one that can be called repeatedly from the main body of the script. As in the case above, it also allows you to avoid executing the procedure at all based on matching a pattern in the input.

You can have a similar effect by using ! and grouping a set of commands. The advantage of the branch command over ! for our application is that we can more easily specify multiple conditions to avoid. The ! symbol can apply to a single command, or it can apply to a set of commands enclosed in braces that immediately follows. The branch command, on the other hand, gives you almost unlimited control over movement around the script.

For example, if we are using multiple macro packages, there may be other macro pairs besides .ES and .EE that define a range of lines that we want to avoid altogether. So, for example, we can write:

/^\.ES/,/^\.EE/b
/^\.PS/,/^\.PE/b
/^\.G1/,/^\.G2/b

To get a good idea of the types of flow control possible in a sed script, let's look at some simple but abstract examples. The first example shows you how to use the branch command to create a loop. Once an input line is read, command1 and command2 will be applied to the line; afterwards, if the contents of the pattern space match the pattern, then control will be passed to the line following the label "top," which means command1 then command2 will be executed again.

:top
command1
command2
/pattern/b top
command3

The script executes command3 only if the pattern doesn't match. All three commands will be executed, although the first two may be executed multiple times.

In the next example, command1 is executed. If the pattern is matched, control passes to the line following the label "end." This means command2 is skipped.

command1
/pattern/b end
command2
:end
command3

In all cases, command1 and command3 are executed.

Now let's look at how to specify that either command2 or command3 are executed, but not both. In the next script, there are two branch commands.

command1
/pattern/b dothree
command2
b
:dothree
command3

The first branch command transfers control to command3. If that pattern is not matched, then command2 is executed. The branch command following command2 sends control to the end of the script, bypassing command3. The first of the branch commands is conditional upon matching the pattern; the second is not. We will look at a "real-world" example after looking at the test command.

6.4.2 The Test Command

The test command branches to a label (or the end of the script) if a successful substitution has been made on the currently addressed line. Thus, it implies a conditional branch. Its syntax follows:

[address ]t [label ]

If no label is supplied, control falls through to the end of the script. If the label is supplied, then execution resumes at the line following the label.

Let's look at an example from Tim O'Reilly. He was trying to generate automatic index entries based on evaluating the arguments in a macro that produced the top of a command reference page. If there were three quoted arguments, he wanted to do something different than if there were two or only one. The task was to try to match each of these cases in succession (3,2,1) and when a successful substitution was made, avoid making any further matches. Here's Tim's script:

/\.Rh 0/{
s/"\(.*\)" "\(.*\)" "\(.*\)"/"\1" "\2" "\3"/
t
s/"\(.*\)" "\(.*\)"/"\1" "\2"/
t
s/"\(.*\)"/"\1"/
}

The test command allows us to drop to the end of the script once a substitution has been made. If there are three arguments on the .Rh line, the test command after the first substitute command will be true, and sed will go on to the next input line. If there are fewer than three arguments, no substitution will be made, the test command will be evaluated false, and the next substitute command will be tried. This will be repeated until all the possibilities are used up.

The test command provides functionality similar to a case statement in the C programming language or the shell programming languages. You can test each case and when a case proves true, then you exit the construct.

If the above script were part of a larger script, we could use a label, perhaps tellingly named "break," to drop to the end of the command grouping where additional commands can be applied.

/\.Rh 0/{
s/"\(.*\)" "\(.*\)" "\(.*\)"/"\1" "\2" "\3"/
t break
.
.
.
}
:break
more commands

The next section gives a full example of the test command and the use of labels.

6.4.3 One More Case

Remember Lenny? He was the fellow given the task of converting Scribe documents to troff . We had sent him the following script:

# Scribe font change script. 
s/@f1(\([^)]*\))/\\fB\1\\fR/g
/@f1(.*/{
N
s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
P
D
}

He sent the following mail after using the script:

Thank you so much!  You've not only fixed the script but shown me
where I was confused about the way it works.  I can repair the
conversion script so that it works with what you've done, but to be
optimal it should do two more things that I can't seem to get working
at all - maybe it's hopeless and I should be content with what's
there.  

First, I'd like to reduce multiple blank lines down to one.
Second, I'd like to make sed match the pattern over more than two
(say, even only three) lines.  

Thanks again.  

Lenny

The first request to reduce a series of blank lines to one has already been shown in this chapter. The following four lines perform this function:

/^$/{
N
/^\n$/D
}

We want to look mainly at accomplishing the second request. Our previous font-change script created a two-line pattern space, tried to make the match across those lines, and then output the first line. The second line became the first line in the pattern space and control passed to the top of the script where another line was read in.

We can use labels to set up a loop that reads multiple lines and makes it possible to match a pattern across multiple lines. The following script sets up two labels: begin at the top of the script and again near the bottom. Look at the improved script:

# Scribe font change script.  New and Improved.
:begin
/@f1(\([^)]*\))/{
s//\\fB\1\\fR/g
b begin
}
/@f1(.*/{
N
s/@f1(\([^)]*\n[^)]*\))/\\fB\1\\fR/g
t again
b begin
}
:again
P
D

Let's look more closely at this script, which has three parts. Beginning with the line that follows :begin , the first part attempts to match the font change syntax if it is found completely on one line. After making the substitution, the branch command transfers control back to the label begin . In other words, once we have made a match, we want to go back to the top and look for other possible matches, including the instruction that has already been applied - there could be multiple occurrences on the line.

The second part attempts to match the pattern over multiple lines. The Next command builds a multiple line pattern space. The substitution command attempts to locate the pattern with an embedded newline. If it succeeds, the test command passes control to the line following the again label. If no substitution is made, control is passed to the line following the label begin so that we can read in another line. This is a loop that goes into effect when we've matched the beginning sequence of a font change request but have not yet found the ending sequence. Sed will loop back and keep appending lines into the pattern space until a match has been found.

The third part is the procedure following the label again . The first line in the pattern space is output and then deleted. Like the previous version of this script, we deal with multiple lines in succession. Control never reaches the bottom of the script but is redirected by the Delete command to the top of the script.