18.7 The RHS

The purpose of the RHS in a rule is to rewrite the workspace. To make this rewriting more versatile, sendmail offers several special RHS operators. The complete list is shown in Table 18-2.

Table 18-2. RHS operators
RHS	§	Description or use
`$digit`	Section 18.7.1	Copy by position
`$:`	Section 18.7.2	Rewrite once (when used as a prefix), or specify the user in a delivery-agent "triple," or specify the default value to return on a failed database-map lookup
`$@`	Section 18.7.3	Rewrite and return (when used as a prefix), or specify the host in a delivery-agent "triple", or specify an argument to pass in a database-map lookup or action
`$>set`	Section 18.7.4	Rewrite through another rule set (such as a subroutine call that returns to the current position)
`$#`	Section 18.7.5	Specify a delivery agent or choose an action, such as to reject or discard a recipient, sender, connection, or message.
`$[ $]`	Section 18.7.6	Canonicalize hostname
`$( $)`	Section 23.4	Perform a lookup in an external database, file, or network service, or perform a change (such as dequoting), or store a value into a macro.
`$&`	Section 21.5.3	Delay conversion of a macro until runtime

18.7.1 Copy by Position: $digit

The $digit operator in the RHS is used to copy tokens from the LHS into the workspace. The digit refers to positions of LHS wildcard operators in the LHS:

R $+ @ $*    $2!$1
      

$1           $2

Here, the $1 in the RHS indicates tokens matched by the first wildcard operator in the LHS (in this case, the $+), and the $2 in the RHS indicates tokens matched by the second wildcard operator in the LHS (the $*). In this example, if the workspace contains A@B.C, it will be rewritten by the RHS as follows (note that the order is defined by the RHS):

$* matches    B.C     so  $2 copies  it to workspace
        !    explicitly added to the workspace
$+ matches    A       so  $1 adds  it to workspace

The $digit copies all the tokens matched by its corresponding wildcard operator. For the $+ wildcard operator, only a single token (A) is matched and copied with $1. The ! is copied as is. For the $* wildcard operator, three tokens are matched (B.C), so $2 copies all three. Thus, this rule rewrites A@B.C into B.C!A.

Not all LHS operators need to be referenced with a $digit in the RHS. Consider the following:

R $* < $* > $*   <$2>

Here, only the middle LHS operator (the second one) is required to rewrite the workspace. So only the $2 is needed in the RHS ($1 and $3 are not needed and are not present in the RHS).

Although macros appear to be operators in the LHS, they are not. Recall that macros are expanded when the configuration file is read (Section 18.2.1). As a consequence, although they appear as $letter in the configuration file, they are converted to tokens when that configuration file is read. For example:

DAxxx
R $A @ $*   $1

Here, the macro A is defined to have the value xxx. To the unwary, the $1 appears to indicate the $A. But when the configuration file is read, the previous rule is expanded into:

R xxx @ $*   $1

Clearly, the $1 refers to the $* (because $ digit references only operators and $A is a macro, not an operator). The sendmail program is unable to detect errors of this sort. If the $1 were instead $2 (in a mistaken attempt to reference the $*), sendmail prints the following error and skips that rule:

ruleset replacement number out of bounds

V8 sendmail catches these errors when the configuration file is read. Earlier versions caught this error only when the rule was actually used.

The digit of the $digit must be in the range one through nine. A $0 is meaningless and causes sendmail to print the previous error message and to skip that rule. Extra digits are considered tokens rather than extensions of the $digit. That is, $11 is the RHS operator $1 and the token 1, not a reference to the 11^th LHS operator.

18.7.2 Rewrite Once Prefix: $:

Ordinarily, the RHS rewrites the workspace as long as the workspace continues to match the LHS. This looping behavior can be useful. Consider the need to strip extra trailing dots off an address in the workspace:

R $* ..        $1.

Here, the $* matches any address that has two or more trailing dots. The $1. in the RHS then strips one of those two trailing dots when rewriting the workspace. For example:

xxx . . . . .     becomes xxx . . . .
xxx . . . .       becomes xxx . . .
xxx . .           becomes xxx . .
xxx . .           becomes xxx .
xxx .          match fails

Although this looping behavior of rules can be handy, for most rules it can be dangerous. Consider the following example:

R $*       <$1>

The intention of this rule is to cause whatever is in the workspace to become surrounded with angle brackets. But after the workspace is rewritten, the LHS again checks for a match; and because the $* matches anything, the match succeeds, the RHS rewrites the workspace again, and again the LHS checks for a match:

xxx               becomes < xxx  >
< xxx >           becomes < < xxx  > >
< < xxx > >       becomes < < < xxx  > > >
     
 
     and so on, until ...
     
 
sendmail prints: rewrite: expansion too long

In this case, sendmail catches the problem because the workspace has become too large. It prints the preceding error message and skips that and all further rules in the rule set. If you are running sendmail in test mode, this fatal error would also be printed:

=  = Ruleset 0 (0) status 65

Unfortunately, not all such endless looping produces a visible error message. Consider the following example:

R $*    $1

Here is an LHS that matches anything and an RHS that rewrites the workspace in such a way that the workspace never changes. For older versions this causes sendmail to appear to hang (as it processes the same rule over and over and over). Newer versions of sendmail will catch such endless looping and will print and log the following error:

Infinite loop in ruleset ruleset_name, rule rule_number

In this instance the original workspace is returned.

It is not always desirable (or even possible) to write "loop-proof" rules. To prevent looping, sendmail offers the $: RHS prefix. By starting the RHS of a rule with the $: operator, you are telling sendmail to rewrite the workspace only once, at most:

R $*   $: <$1>

Again the rule causes the contents of the workspace to be surrounded by a pair of angle brackets. But here the $: prefix prevents the LHS from checking for another match after the rewrite.

Note that the $: prefix must begin the RHS to have any effect. If it instead appears inside the RHS, its special meaning is lost:

foo  rewritten by  $: $1   becomes   foo 
foo  rewritten by  $1 $:   becomes   foo $:

18.7.3 Rewrite-and-Return Prefix: $@

The flow of rules is such that each and every rule in a series of rules (a rule set) is given a chance to match the workspace:

R xxx     yyy
R yyy     zzz

The first rule matches xxx in the workspace and rewrites the workspace to contain yyy. The first rule then tries to match the workspace again but, of course, fails. The second rule then tries to match the workspace. Because the workspace contains yyy, a match is found, and the RHS rewrites the workspace to be zzz.

There will often be times when one rule in a series performs the appropriate rewrite and no subsequent rules need to be called. In the earlier example, suppose xxx should only become yyy and that the second rule should not be called. To solve problems such as this, sendmail offers the $@ prefix for use in the RHS.

The $@ prefix tells sendmail that the current rule is the last one that should be used in the current rule set. If the LHS of the current rule matches, any rules that follow (in the current rule set) are ignored:

R xxx   $@ yyy
R yyy   zzz

If the workspace contains anything other than xxx, the first rule does not match, and the second rule is called. But if the workspace contains xxx, the first rule matches and rewrites the workspace. The $@ prefix for the RHS of that rule prevents the second rule (and any subsequent rules in that rule set) from being called.

Note that the $@ also prevents looping. The $@ tells sendmail to skip further rules and to rewrite only once. The difference between $@ and $: is that both rewrite only once, but $@ doesn't proceed to the next rule, whereas $: does.

The $@ operator must be used as a prefix because it has special meaning only when it begins the RHS of a rule. If it appears anywhere else inside the RHS it loses its special meaning:

foo  rewritten by  $@ $1   becomes   foo 
foo  rewritten by  $1 $@   becomes   foo $@

18.7.4 Rewrite Through Another Rule Set: $>set

Rules are organized in sets that can be thought of as subroutines. Occasionally, a series of rules can be common to two or more rule sets. To make the configuration file more compact and somewhat clearer, such common series of rules can be made into separate subroutines.

The RHS $>set operator tells sendmail to perform additional rewriting using a secondary set of rules. The set is the rule-set name or number of that secondary set. If set is the name or number of a nonexistent rule set, the effect is the same as if the subroutine rules were never called (the workspace is unchanged).

If the set is numeric and is greater than the maximum number of allowable rule sets, sendmail prints the following error and skips that rule:

bad ruleset bad_number (maximum max)

If the set is a name and the rule-set name is undeclared, sendmail prints the following error and skips that rule:

Unknown ruleset bad_name

Neither of these errors is caught when the configuration file is read. They are caught only when mail is sent because a rule set name can be a macro:

$> $&{SET}

The $& prefix prevents the macro named {SET} from being expanded when the configuration file is read. Therefore, the name or number of the rule set cannot be known until mail is sent.

The process of calling another set of rules proceeds in five stages:

First: As usual, if the LHS matches the workspace, the RHS gets to rewrite the workspace.
Second: The RHS ignores the $>set part and rewrites the rest as usual.
Third: The part of the rewritten workspace following the $>set is then given to the set of rules specified by set. They either rewrite the workspace or do not.
Fourth: The portion of the original RHS from the $>set to the end is replaced with the subroutine's rewriting, as though it had performed the subroutine's rewriting itself.
Fifth: The LHS gets a crack at the new workspace as usual unless it is prevented by a $: or $@ prefix in the RHS.

For example, consider the following two sets of rules:

# first set
S21
R $*..   $:$>22 $1.     strip extra trailing dots
 ...etc.

# second set
S22
R $*..    $1.           strip trailing dots

Here, the first set of rules contains, among other things, a single rule that removes extra dots from the end of an address. But because other rule sets might also need extra dots stripped, a subroutine (the second set of rules) is created to perform that task.

Note that the first rule strips one trailing dot from the workspace and then calls rule set 22 (the $>22), which then strips any additional dots. The workspace, as rewritten by rule set 22, becomes the workspace yielded by the RHS in the first rule. The $: prevents the LHS of the first rule from looking for a match a second time.

Prior to V8.8 sendmail the subroutine call must begin the RHS (immediately follow any $@ or $: prefix, if any), and only a single subroutine can be called. That is, the following causes rule set 22 to be called but does not call 23:

$>22 xxx $>23 yyy

Instead of calling rule set 23, the $> operator and the 23 are copied as is into the workspace, and that workspace is passed to rule set 22:

xxx $> 23 yyy   passed to rule set 22

Beginning with V8.8^[7] sendmail, subroutine calls can appear anywhere inside the RHS, and there can be multiple subroutine calls. Consider the same RHS as shown earlier:

^[7] Using code derived from IDA sendmail.

$>22 xxx $>23 yyy

Beginning with V8.8 sendmail, rule set 23 is called first and is given the workspace yyy to rewrite. The workspace, as rewritten by rule set 23, is added to the end of the xxx, and the combined result is passed to rule set 22.

Under V8.8 sendmail, subroutine rule-set calls are performed from right to left. The result (rewritten workspace) of each call is appended to the RHS text to the left.

You should beware of one problem with all versions of sendmail. When ordinary text immediately follows the number of the rule set, that text is likely to be ignored. This can be witnessed by using the -d21.3 debugging switch.

Consider the following RHS:

$>3uucp.$1

Because sendmail parses the 3 and the uucp as a single token, the subroutine call succeeds, but the uucp is lost. The -d21.3 switch illustrates this problem:

-----callsubr 3uucp (3)     sees this
-----callsubr 3 (3)        but should have seen this

The 3uucp is interpreted as the number 3, so it is accepted as a valid number despite the fact that uucp was attached. Because the uucp is a part of the number, it is not available for comparison to the workspace and so is lost. The correct way to write the previous RHS is:

$>3 uucp.$1

Note that the space between the 3 and the uucp causes them to be viewed as two separate tokens.

This problem can also arise with macros. Consider the following:

$>3$M

Here, the $M is expanded when the configuration file is parsed. If the expanded value lacks a leading space, that value (or the first token in it) is lost.

Note that operators that follow a rule-set number are correctly recognized:

$>3$[$1$]

Here, the 3 is immediately followed by the $[ operator. Because operators are token separators, the call to rule set 3 will be correctly interpreted as:

-----callsubr 3 (3)        good

But as a general rule, and just to be safe, the number of a subroutine call should always be followed by a space.^[8]

^[8] Stylistically, it is easier to read rules that have spaces between all patterns that are expected to match separate tokens. For example, use $+ @ $* $=m instead of $+@$*$=m. This style handles subroutine calls automatically.

18.7.5 Return a Selection: $#

The $# operator in the RHS is copied as is into the workspace and functions as a flag advising sendmail that an action has been selected. The $# must be the first token copied into the rewritten workspace for it to have this special meaning. If it occupies any other position in the workspace, it loses its special meaning:

$# local         selects delivery agent in the parse rule set 0 
$# OK            accepts a message in the Local_check_mail rule set 
xxx $# local     no special meaning

When it is used in the parse rule set 0 (Section 19.5) and localaddr rule set 5 (Section 19.6) (and occupies the first position in the rewritten workspace), the $# operator tells sendmail that the second token in the workspace is the name of a delivery agent (here, local). When used in the check_ rule sets (Section 7.3 and Section 7.1), subsequent tokens in the workspace (here, OK) say how a message should be handled.

Note that the $# operator can be prefixed with a $@ or a $: without losing its special meaning because those prefix operators are not copied to the workspace:

$@ $# local     rewritten as $# local

However, those prefix operators are not necessary because the $# acts just like a $@ prefix. It prevents the LHS from attempting to match again after the RHS rewrite, and it causes any following rules (in that rule set) to be skipped. When used in non-prefix roles in the parse rule set 0 and localaddr rule set 5, $@ and $: also act like flags, conveying host and address information to sendmail (Section 19.5).

18.7.6 Canonicalize Hostname: $[ and $]

Tokens that appear between a $[ and $] pair of operators in the RHS are considered to be the name of a host. That hostname is looked up by using DNS^[9] and replaced with the full canonical form of that name. If found, it is then copied to the workspace, and the $[ and $] are discarded.

^[9] Or other means, depending on the setting of service switch file, if you have one, or the state of the ServiceSwitchFile option (ServiceSwitchFile).

For example, consider a rule that looks for a hostname in angle brackets and (if found) rewrites it in canonical form:

R < $* >     $@ < $[ $1 $] >     canonicalize hostname

Such canonicalization is useful at sites where users frequently send mail to machines using the short version of a machine's name. The $[ tells sendmail to view all the tokens that follow (up to the $]) as a single hostname.

If the name cannot be canonicalized (perhaps because there is no such host), the name is copied as is into the workspace. For configuration files lower than 2, no indication is given that it could not be canonicalized (more about this soon).

Note that if the $[ is omitted and the $] is included, the $] loses its special meaning and is copied as is into the workspace.

The hostname between the $[ and $] can also be an IP address. By surrounding the hostname with square brackets ([ and ]), you are telling sendmail that it is really an IP address:

wash.dc.gov                   a hostname
[123.45.67.8]                 an IPv4 address
[IPv6:2002:c0a8:51d2::23f4]   an IPv6 address

When the IP address between the square brackets corresponds to a known host, the address and the square brackets are replaced with that host's canonical name. Note that when handling IPv6 addresses, the IPv6: prefix must be present. After the successful lookup of a known host, the entire expression between $[ and $] will be replaced with the new information.

If the version of the configuration file is 2 or greater (as set with the V configuration command, Section 17.5), a successful canonicalization has a dot appended to the result:

myhost       becomes  myhost . domain .    success 
nohost       becomes  nohost                 failure

Note that a trailing dot is not legal in an address specification, so subsequent rules (such as rule set 4) must remove these added trailing dots.^[10]

^[10] Under DNS the trailing dot signifies the root (topmost) domain. Therefore, under DNS, a trailing dot is legal. For mail, however, RFC1123 specifically states that no address is to be propagated that contains a trailing dot.

Also, the K configuration command (Section 23.2) can be used to redefine (or eliminate) the dot as the added character. For example:

Khost host -a.found

This causes sendmail to add the text .found to a successfully canonicalized hostname instead of the dot.

One difference between V8 sendmail and other versions is the way it looks up names from between the $[ and $] operators. The rules for V8 sendmail are as follows:

First: If the name contains at least one dot (.) anywhere within it, it is looked up as is; for example, host.com.
Second: If that fails, it appends the default domain to the name (as defined in /etc/resolv.conf) and tries to look up the result; for example, host.com.foo.edu.
Third: If that fails, each entry in the domain search path (as defined in /etc/resolv.conf) is appended to the original host; for example, host.com.edu.
Fourth: If the original name did not have a dot in it, it is looked up as is; for example, host.

This approach allows names such as host.com to first match an actual site, such as sendmail.com (if that was intended), instead of wrongly matching a host in a local department of your school. This is particularly important if you have wildcard MX records for your site.

18.7.6.1 An example of canonicalization

The following three-line configuration file can be used to observe how sendmail canonicalizes hostnames:

V10
SCanon
R $*        $@ $[ $1 $]

If this file were called test.cf, sendmail could be run in rule-testing mode with a command such as the following:

% /usr/sbin/sendmail -Ctest.cf -bt

Thereafter, hostname canonicalization can be observed by specifying the Canon rule set and a hostname. One such run of tests might appear as follows:

ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>
> Canon wash
canon              input: wash
canon            returns: wash . dc. gov .
> Canon nohost
canon              input: nohost
canon            returns: nohost
>

Note that the known host named wash is rewritten in canonicalized form (with a dot appended because the version of this miniconfiguration file, the V10, is greater than 2). The unknown host named nohost is unchanged and has no dot appended.

18.7.6.2 Default in canonicalization: $:

IDA and V8 sendmail both offer an alternative to leaving the hostname unchanged when canonicalization fails with $[ and $]. A default can be used instead of the failed hostname by prefixing that default with a $: operator:

$[ host $:  default  $]

The $: default must follow the host (or square-brace-enclosed address) and precede the $]. To illustrate its use, consider the following rule:

R $*    $: $[ $1 $: $1.notfound $]

If the hostname $1 can be canonicalized, the workspace becomes that canonicalized name. If it cannot, the workspace becomes the original hostname with a .notfound appended to it. If the default part of the $:default is omitted, a failed canonicalization is rewritten as zero tokens.

Because the $[ and $] operators are implemented using the host dbtype (Section 23.4.3), you can modify the behavior of that dbtype by adding a -T to it:

Khost host -T.tmp

Thereafter, whenever $[ and $] find a temporary lookup failure, the suffix .tmp is returned, and .notfound, in this example, is returned only if the host truly does not exist.

18.7.7 Other Operators

Many other operators (depending on your version of sendmail) can also be used in rules. Because of their individual complexity, all of the following are detailed in other chapters. We outline them here, however, for completeness.

Class macros: Class macros are described in Section 22.2.1 and Section 22.2.2 of Chapter 22. Class macros can appear only in the LHS. They begin with the prefix $= to match a token in the workspace to one of many items in a class. The alternative prefix $~ causes a single token in the workspace to match if it does not appear in the list of items that are in the class.
Conditionals: The conditional macro operator $? is rarely used in rules (Section 21.6). When it is used in rules, the result is often not what was intended. Its else part, the $| conditional operator, is used by the various rule sets (Section 7.1.4) to separate two differing pieces of information in the workspace.
Database Maps: The database-map operators, $( and $), are used to look up tokens in various types of database files, plain files, and network services. They also provide access to internal services, such as dequoting or storing a value in the macro (see Chapter 23).