Rewrite rules are the heart of the sendmail.cf file. Rulesets are groups of individual rewrite rules used to parse email addresses from user mail programs and rewrite them into the form required by the mail delivery programs. Each rewrite rule is defined by an R command. The syntax of the R command is:
The fields in an R command are separated by tab characters. The comment field is ignored by the system, but good comments are vital if you want to have any hope of understanding what's going on. The pattern and transformation fields are the heart of this command.
Rewrite rules match the input address against the pattern, and if a match is found, rewrite the address in a new format using the rules defined in the transformation. A rewrite rule may process the same address several times because, after being rewritten, the address is again compared against the pattern. If it still matches, it is rewritten again. The cycle of pattern matching and rewriting continues until the address no longer matches the pattern.
The pattern is defined using macros, classes, literals, and special metasymbols. The macros, classes, and literals provide the values against which the input is compared, and the metasymbols define the rules used in matching the pattern. Table 10.4 shows the metasymbols used for pattern matching.
All of the metasymbols request a match for some number of tokens. A token is a string of characters in an email address delimited by an operator. The operators are the characters defined in the OperatorChars option.  Operators are also counted as tokens when an address is parsed. For example:
This email address contains seven tokens: becky, @, peanut, ., nuts, ., and com. This address would match the pattern:
The address matches the pattern because:
Many addresses, firstname.lastname@example.org , email@example.com , etc., match this pattern, but other addresses do not. For example, firstname.lastname@example.org does not match because it has three tokens: rebecca, ., and hunt, before the @. Therefore, it fails to meet the requirement of exactly one token specified by the $- symbol. Using the metasymbols, macros, and literals, patterns can be constructed to match any type of email address.
When an address matches a pattern, the strings from the address that match the metasymbols are assigned to indefinite tokens . The matching strings are called indefinite tokens because they may contain more than one token value. The indefinite tokens are identified numerically according to the relative position in the pattern of the metasymbol that the string matched. In other words, the indefinite token produced by the match of the first metasymbol is called $1; the match of the second symbol is called $2; the third is $3; and so on. When the address email@example.com matched the pattern $-@$+ , two indefinite tokens were created. The first is identified as $1 and contains the single token, becky , that matched the $- symbol. The second indefinite token is $2 and contains the five tokens - peanut, ., nuts, ., and com - that matched the $+ symbol. The indefinite tokens created by the pattern matching can then be referenced by name ($1, $2, etc.) when rewriting the address.
A few of the symbols in Table 10.4 are used only in special cases. The $@ symbol is normally used by itself to test for an empty, or null, address. The symbols that test against NIS maps, can only be used on Sun systems that run the sendmail program that Sun provides with the operating system. We'll see in the next section that systems running sendmail V8 can use NIS maps, but only for transformation - not for pattern matching.
The transformation field, from the righthand side of the rewrite rule, defines the format used for rewriting the address. It is defined with the same things used to define the pattern: literals, macros, and special metasymbols. Literals in the transformation are written into the new address exactly as shown. Macros are expanded and then written. The metasymbols perform special functions. The transformation metasymbols and their functions are shown in Table 10.5
Assume the current rewrite rule is:
R$+<@$-> $1<@$2.$D> user@host -> firstname.lastname@example.org
The address matches the pattern because it contains one or more tokens before the literal <@, exactly one token after the <@, and then the literal >. The pattern match produces two indefinite tokens that are used in the transformation to rewrite the address.
The transformation contains the indefinite token $1, a literal <@, indefinite token $2, a literal dot (.), the macro D, and the literal >. After the pattern matching, $1 contains kathy.mccafferty and $2 contains peanut . Assume that the macro D was defined elsewhere in the sendmail.cf file as nuts.com . In this case the input address is rewritten as:
Figure 10.3 illustrates this specific address rewrite. It shows the tokens derived from the input address, and how those tokens are matched against the pattern. It also shows the indefinite tokens produced by the pattern matching, and how the indefinite tokens, and other values from the transformation, are used to produce the rewritten address. After rewriting, the address is again compared to the pattern. This time it fails to match the pattern because it no longer contains exactly one token between the literal <@ and the literal >. So, no further processing is done by this rewrite rule and the address is passed to the next rule in line. Rules in a ruleset are processed sequentially, though a few metasymbols can be used to modify this flow.
The $> n symbol calls ruleset n and passes the address defined by the remainder of the transformation to ruleset n for processing. For example:
$>9 $1 % $2
This transformation calls ruleset 9 ($>9), and passes the contents of $1, a literal %, and the contents of $2 to ruleset 9 for processing. When ruleset 9 finishes processing, it returns a rewritten address to the calling rule. The returned email address is then compared again to the pattern in the calling rule. If it still matches, ruleset 9 is called again.
The recursion built into rewrite rules creates the possibility for infinite loops. sendmail does its best to detect possible loops, but you should take responsibility for writing rules that don't loop. The $@ and the $: symbols are used to control processing and to prevent loops. If the transformation begins with the $@ symbol, the entire ruleset is terminated and the remainder of the transformation is the value returned by the ruleset. If the transformation begins with the $: symbol, the individual rule is executed only once. Use $: to prevent recursion and to prevent loops when calling other rulesets. Use $@ to exit a ruleset at a specific rule.
The $[ name $] symbol converts a host's nickname or its IP address to its canonical name by passing the value name to the name server for resolution. For example, using the nuts.com name servers, $[goober$] returns peanut.nuts.com and $[[172.16.12.1]$] returns almond.nuts.com .
In the same way that a hostname or address is used to look up a canonical name in the name server database, the $( map key $) syntax uses the key to retrieve information from the database identified by map . This is a more generalized database retrieval syntax than is the one that returns canonical hostnames, and it is more complex to use. Before we get into the details of setting up and using databases from within sendmail, let's finish describing the rest of the syntax of rewrite rules.
There is a special rewrite rule syntax that is used in ruleset 0. Ruleset 0 defines the triple ( mailer, host, user ) that specifies the mail delivery program, the recipient host, and the recipient user.
The special transformation syntax used to do this is:
An example of this syntax taken from the linux.smtp.cf sample file is:
R$*<@$*>$* $#smtp$@$2$:$1<@$2>$3 email@example.com
Assume the email address
by this rule. The address matches the pattern
This pattern match produces two indefinite tokens. Indefinite token $1 contains david and $2 contains filbert.nuts.com . No other matches occurred, so $3 is null. These indefinite tokens are used to rewrite the address into the following triple:
The components of this triple are:
R<@$+> $#firstname.lastname@example.org$:"user address required"
This rule returns the message "user address required" if the address matches the pattern.
An example will make the use of arguments clear. Assume the following input address:
Further, assume the following database with the internal sendmail name of "relays":
oil %1<@relay.fats.com> sugar %1<@relay.calories.com> salt %1<@server.sodium.org>
Finally, assume the following rewrite rule:
R$+<@$-> $(relays $2 $@ $1 $:$1<@$2> $)
The input address tom.martin<@sugar> matches the pattern because it has one or more tokens (tom.martin) before the literal <@ and exactly one token (sugar) after it. The pattern matching creates two indefinite tokens and passes them to the transformation. The transformation calls the database (relays) and passes it token $2 (sugar) as the key and token $1 (tom.martin) as the argument. If the key is not found in the database the default ($1<@$2>) is used. In this case, the key is found in the database. The database program uses the key to retrieve "%email@example.com", expands the %1 argument, and returns "firstname.lastname@example.org" to sendmail, which uses the returned value to replace the input address.
To define the "relays" database file used in the example above, we might enter the following command in the sendmail.cf file:
Krelays dbm /usr/local/relays
The name relays is simply a name you chose because it is descriptive. The database type dbm is a type supported by your version of sendmail and was used by you when you built the database file. Finally, the argument /usr/local/relays is the location of the database file you created.
Don't worry if you're confused about how to build and use database files within sendmail. We will revisit this topic later in the chapter and the examples will make the practical use of database files clear.
Rulesets are groups of associated rewrite rules that can be referenced by a number. The S command marks the beginning of a ruleset and identifies it with a number. In the S n command syntax, n is the number that identifies the ruleset. Numbers in the range of 0 to 99 are used.
Rulesets can be thought of as subroutines, or functions, designed to process email addresses. They are called from mailer definitions, from individual rewrite rules, or directly by sendmail. Six rulesets have special functions and are called directly by sendmail. These are:
Figure 10.4 shows the flow of the message and addresses through these rulesets. The D box does not symbolize a ruleset. It is the internal sendmail process described above. The S and R symbols do stand for rulesets. They have numeric names just like all normal rulesets, but the numbers are not fixed as is the case with rulesets 0, 1, 2, 3, 4, and 5. The S and R ruleset numbers are defined in the S and R fields of the mailer definition. Each mailer may specify its own S and R rulesets for mailer-specific cleanup of the sender and recipient addresses just before the message is delivered.
There are, of course, many more rulesets in most sendmail.cf files. The other rulesets provide additional address processing and are called by existing rulesets using the $> n construct.  The rulesets provided in any sample sendmail.cf file will be adequate for delivering SMTP mail. It's unlikely you'll have to add to these rulesets, unless you want to add new features to your mailer.