Shell Subprocesses and Subshells (Learning the Korn Shell, 2nd Edition)

8.6.2. Subshells

A special kind of shell subprocess is the subshell. The subshell is started within the same script (or function) as the parent. You do this in a manner very similar to the code blocks we saw in the last chapter. Just surround some shell code with parentheses (instead of curly braces), and that code runs in a subshell.

For example, here is the calculator program, from above, with a subshell instead of a code block:

( while read line'?adc> '; do
      print "$(alg2rpn $line)"
  done
) | dc

The code inside the parentheses runs as a separate process.[125] This is usually less efficient than a code block. The differences in functionality between subshells and code blocks are very few; they primarily pertain to issues of scope, i.e., the domains in which definitions of things like shell variables and signal traps are known. First, code inside a subshell obeys the above rules of shell subprocess inheritance, except that it knows about variables defined in the surrounding shell; in contrast, think of blocks as code units that inherit everything from the outer shell. Second, variables and traps defined inside a code block are known to the shell code after the block, whereas those defined in a subshell are not.

[125] For performance reasons, the Korn shell tries very hard to avoid actually creating a separate process to run code in parentheses and inside $(...). But the results should always be the same as if the code ran in a separate process.

For example, consider this code:

{
    fred=bob
    trap 'print "You hit CTRL-C!"' INT
}
while true; do
    print "\$fred is $fred"
    sleep 60
done

If you run this code, you will see the message $fred is bob every 60 seconds, and if you type CTRL-C, you will see the message, You hit CTRL-C!. You will need to type CTRL-\ to stop it (don't forget to remove the core file). Now let's change it to a subshell:

(
    fred=bob
    trap 'print "You hit CTRL-C!"' INT
)
while true; do
    print "\$fred is $fred"
    sleep 60
done

If you run this, you will see the message $fred is; the outer shell doesn't know about the subshell's definition of fred and therefore thinks it's null. Furthermore, the outer shell doesn't know about the subshell's trap of the INT signal, so if you hit CTRL-C, the script terminates.

If a language supports code nesting, definitions inside a nested unit should have a scope limited to that nested unit. In other words, subshells give you better control than code blocks over the scope of variables and signal traps. Therefore we feel that you should use subshells instead of code blocks if they are to contain variable definitions or signal traps -- unless efficiency is a concern.

This has been a long chapter, and it has covered a lot of territory. Here are some exercises that should help you make sure you have a firm grasp on the material. The last exercise is especially difficult for those without backgrounds in compilers, parsing theory, or formal language theory.

Write a function called pinfo that combines the jobs and ps commands by printing a list of jobs with their job numbers, corresponding process IDs, running times, and full commands. Extra credit: describe why this has to be a function and not a script.

Take the latest version of our C compiler shell script -- or some other non-trivial shell script -- and "bullet-proof" it with signal traps.

Redo the findterms program in the last chapter using a subshell instead of a code block.

The following doesn't have that much to do with the material in this chapter per se, but it is a classic programming exercise, and it will give you some good practice if you do it:
1. Write the function alg2rpn used in adc. Here's how to do this: arithmetic expressions in algebraic notation have the form expr op expr, where each expr is either a number or another expression (perhaps in parentheses), and op is +, -, x, /, or % (remainder). In RPN, expressions have the form expr expr op. For example: the algebraic expression 2+3 is 2 3 + in RPN; the RPN equivalent of (2+3) x (9-5) is 2 3 + 9 5 - x. The main advantage of RPN is that it obviates the need for parentheses and operator precedence rules (e.g., the rule that x is evaluated before +). The dc program accepts standard RPN, but each expression should have "p" appended to it: this tells dc to print its result, e.g., the first example above should be given to dc as 2 3 + p.
2. You need to write a routine that converts algebraic notation to RPN. This should be (or include) a function that calls itself (known as a recursive function) whenever it encounters a subexpression. It is especially important that this function keep track of where it is in the input string and how much of the string it eats up during its processing. (Hint: make use of the pattern matching operators discussed in Chapter 4 to ease the task of parsing input strings.)
  
  To make your life easier, don't worry about operator precedence for now; just convert to RPN from left to right. e.g., treat 3+4x5 as (3+4)x5 and 3x4+5 as (3x4)+5. This makes it possible for you to convert the input string on the fly, i.e., without having to read in the whole thing before doing any processing.
3. Enhance your solution to the previous exercise so that it supports operator precedence in the usual order: x, /, % (remainder) +, -. e.g., treat 3+4x5 as 3+(4x5) and 3x4+5 as (3x4)+5.

8.6. Shell Subprocesses and Subshells

8.6.1. Shell Subprocess Inheritance

8.6.2. Subshells