[Chapter 8] 8.6 Subshells

8.6 Subshells

Coroutines clearly represent the most complex relationship between processes that the Korn shell defines. To conclude this chapter, we will look at a much simpler type of interprocess relationship: that of a subshell with its parent shell. We saw in Chapter 3 that whenever you run a shell script, you actually invoke another copy of the shell that is a subprocess of the main, or parent , shell process. Now let's look at subshells in more detail.

8.6.1 Subshell Inheritance

The most important things you need to know about subshells are what characteristics they get, or inherit , from their parents. These are as follows:

The current directory
Environment variables
Standard input, output, and error plus any other open file descriptors
Any characteristics defined in the environment file (see Chapter 3 )
Signals that are ignored

The first three of these are inherited by all subprocesses, while the last is unique to subshells. Just as important are the things that a subshell does not inherit from its parent:

Shell variables, except environment variables and those defined in the environment file
Handling of signals that are not ignored

We covered some of this earlier (in Chapter 3 ), but these points are common sources of confusion, so they bear repeating.

8.6.2 Nested Subshells

Subshells need not be in separate scripts; you can also start a subshell within the same script (or function) as the parent. You do this in a manner very similar to the code blocks we saw in the last chapter. Just surround some shell code with parentheses (instead of curly brackets), and that code will run in a subshell. We'll call this a nested subshell.

For example, here is the calculator program, from above, with a subshell instead of a code block:

( while read line'?adc> '; do
      print "$(alg2rpn $line)"
  done 
) | dc

The code inside the parentheses will run as a separate process. This is usually less efficient than a code block. The differences in functionality between subshells and code blocks are very few; they primarily pertain to issues of scope, i.e., the domains in which definitions of things like shell variables and signal traps are known. First, code inside a nested subshell obeys the above rules of subshell inheritance, except that it knows about variables defined in the surrounding shell; in contrast, think of blocks as code units that inherit everything from the outer shell. Second, variables and traps defined inside a code block are known to the shell code after the block, whereas those defined in a subshell are not.

For example, consider this code:

{
    fred=bob
    trap 'print \'You hit CTRL-C!\'' INT
}
while true; do
    print "\$fred is $fred"
    sleep 60
done

If you run this code, you will see the message $fred is bob every 60 seconds, and if you type CTRL-C , you will see the message, You hit CTRL-C! . You will need to type CTRL-\ to stop it (don't forget to remove the core file). Now let's change it to a nested subshell:

(
    fred=bob
    trap 'print \'You hit CTRL-C!\'' INT
)
while true; do
    print "\$fred is $fred"
    sleep 60
done

If you run this, you will see the message $fred is ; the outer shell doesn't know about the subshell's definition of fred and therefore thinks it's null. Furthermore, the outer shell doesn't know about the subshell's trap of the INT signal, so if you hit CTRL-C , the script will terminate.

If a language supports code nesting, then it's considered desirable that definitions inside a nested unit have a scope limited to that nested unit. In other words, nested subshells give you better control than code blocks over the scope of variables and signal traps. Therefore we feel that you should use subshells instead of code blocks if they are to contain variable definitions or signal traps-unless efficiency is a concern.

This has been a long chapter, and it has covered a lot of territory. Here are some exercises that should help you make sure you have a firm grasp on the material. The last exercise is especially difficult for those without backgrounds in compilers, parsing theory, or formal language theory.

Write a shell script called pinfo that combines the jobs and ps commands by printing a list of jobs with their job numbers, corresponding process IDs, running times, and full commands.
Take the latest version of our C compiler shell script-or some other non-trivial shell script-and "bullet-proof" it with signal traps.
Take the non-pipeline version of our C compiler-or some other non-trivial shell script-and parallelize it as much as possible.
Write the code that checks for duplicate arguments to the mcp script. Bear in mind that different pathnames can point to the same file. (Hint: if $i is "1", then eval ' print \${$i} ' prints the first command-line argument. Make sure you understand why.)
Redo the findterms program in the last chapter using a nested subshell instead of a code block.
(The following doesn't have that much to do with the material in this chapter per se , but it is a classic programming exercise:)
1. Write the function alg2rpn used in adc . Here's how to do this: Arithmetic expressions in algebraic notation have the form expr op expr , where each expr is either a number or another expression (perhaps in parentheses), and op is +, -, ×, /, or % (remainder). In RPN, expressions have the form expr expr op . For example: the algebraic expression 2 + 3 is 2 3 + in RPN; the RPN equivalent of (2+3) × (9-5) is 2 3 + 9 5 - ×. The main advantage of RPN is that it obviates the need for parentheses and operator precedence rules (e.g., × is evaluated before +). The dc program accepts standard RPN, but each expression should have "p" appended to it: this tells dc to print its result, e.g., the first example above should be given to dc as 2 3 + p .
2. You need to write a routine that converts algebraic notation to RPN. This should be (or include) a function that calls itself (known as a recursive function) whenever it encounters a subexpression. It is especially important that this function keep track of where it is in the input string and how much of the string it "eats up" during its processing. (Hint: make use of the pattern matching operators discussed in Chapter 4 to ease the task of parsing input strings.)
  
  To make your life easier, don't worry about operator precedence for now; just convert to RPN from left to right. e.g., treat 3+4×5 as (3+4)×5 and 3×4+5 as (3×4)+5 . This makes it possible for you to convert the input string on the fly, i.e., without having to read in the whole thing before doing any processing.
3. Enhance your solution to the previous exercise so that it supports operator precedence in the "usual" order: ×, /, % (remainder) +, -. e.g., treat 3+4×5 as 3+(4×5) and 3×4+5 as (3×4)+5 .