[Chapter 10] 10.6 Rewriting the Mail Address

10.6 Rewriting the Mail Address

Rewrite rules are the heart of the sendmail.cf file. Rulesets are groups of individual rewrite rules used to parse email addresses from user mail programs and rewrite them into the form required by the mail delivery programs. Each rewrite rule is defined by an R command. The syntax of the R command is:


R


pattern    transformation    comment

The fields in an R command are separated by tab characters. The comment field is ignored by the system, but good comments are vital if you want to have any hope of understanding what's going on. The pattern and transformation fields are the heart of this command.

10.6.1 Pattern Matching

Rewrite rules match the input address against the pattern, and if a match is found, rewrite the address in a new format using the rules defined in the transformation. A rewrite rule may process the same address several times because, after being rewritten, the address is again compared against the pattern. If it still matches, it is rewritten again. The cycle of pattern matching and rewriting continues until the address no longer matches the pattern.

The pattern is defined using macros, classes, literals, and special metasymbols. The macros, classes, and literals provide the values against which the input is compared, and the metasymbols define the rules used in matching the pattern. Table 10.4 shows the metasymbols used for pattern matching.

Table 10.4: Pattern Matching Symbols
Symbol	Meaning
$@	Match exactly zero tokens.
$*	Match zero or more tokens.
$+	Match one or more tokens.
$-	Match exactly one token.
$= x	Match any token in class x .
$~ x	Match any token not in class x .
$ x	Match all tokens in macro x .
$% x	Match any token in the NIS map named in macro x . [17]
$! x	Match any token not in the NIS map named in macro x .[17]
$%y	Match any token in the NIS hosts.byname map.[17]

[17] This symbol is specific to Sun operating systems.

All of the metasymbols request a match for some number of tokens. A token is a string of characters in an email address delimited by an operator. The operators are the characters defined in the OperatorChars option. [18] Operators are also counted as tokens when an address is parsed. For example:

[18] On older systems, they are defined in the o macro. See Appendix E .

becky@peanut.nuts.com

This email address contains seven tokens: becky, @, peanut, ., nuts, ., and com. This address would match the pattern:

$-@$+

The address matches the pattern because:

It has exactly one token before the @ that matches the requirement of the $- symbol.
It has an @ that matches the pattern's literal @.
It has one or more tokens after the @ that match the requirement of the $+ symbol.

Many addresses, hostmaster@rs.internic.net , craigh@ora.com , etc., match this pattern, but other addresses do not. For example, rebecca.hunt@nuts.com does not match because it has three tokens: rebecca, ., and hunt, before the @. Therefore, it fails to meet the requirement of exactly one token specified by the $- symbol. Using the metasymbols, macros, and literals, patterns can be constructed to match any type of email address.

When an address matches a pattern, the strings from the address that match the metasymbols are assigned to indefinite tokens . The matching strings are called indefinite tokens because they may contain more than one token value. The indefinite tokens are identified numerically according to the relative position in the pattern of the metasymbol that the string matched. In other words, the indefinite token produced by the match of the first metasymbol is called $1; the match of the second symbol is called $2; the third is $3; and so on. When the address becky@peanut.nuts.com matched the pattern $-@$+ , two indefinite tokens were created. The first is identified as $1 and contains the single token, becky , that matched the $- symbol. The second indefinite token is $2 and contains the five tokens - peanut, ., nuts, ., and com - that matched the $+ symbol. The indefinite tokens created by the pattern matching can then be referenced by name ($1, $2, etc.) when rewriting the address.

A few of the symbols in Table 10.4 are used only in special cases. The $@ symbol is normally used by itself to test for an empty, or null, address. The symbols that test against NIS maps, can only be used on Sun systems that run the sendmail program that Sun provides with the operating system. We'll see in the next section that systems running sendmail V8 can use NIS maps, but only for transformation - not for pattern matching.

10.6.2 Transforming the Address

The transformation field, from the righthand side of the rewrite rule, defines the format used for rewriting the address. It is defined with the same things used to define the pattern: literals, macros, and special metasymbols. Literals in the transformation are written into the new address exactly as shown. Macros are expanded and then written. The metasymbols perform special functions. The transformation metasymbols and their functions are shown in Table 10.5

Table 10.5: Transformation Metasymbols
Symbol	Meaning
$ `n`	Substitute indefinite token `n` .
$[ `name` $ ]	Substitute the canonical form of `name` .
$( `map key` $@ `argument` $: `default` $)	Substitute a value from database `map` indexed by `key` .
$> `n`	Call ruleset `n` .
$@	Terminate ruleset.
$:	Terminate rewrite rule.

The $ n symbol, where n is a number, is used for the indefinite token substitution discussed above. The indefinite token is expanded and written to the "new" address. Indefinite token substitution is essential for flexible address rewriting. Without it, values could not be easily moved from the input address to the rewritten address. The following example demonstrates this.

Addresses are always processed by several rewrite rules. No one rule tries to do everything. Assume the input address mccafferty@peanut has been through some preliminary processing and now is:

kathy.mccafferty<@peanut>

Assume the current rewrite rule is:

R$+<@$->    $1<@$2.$D>   user@host -> user@host.domain

The address matches the pattern because it contains one or more tokens before the literal <@, exactly one token after the <@, and then the literal >. The pattern match produces two indefinite tokens that are used in the transformation to rewrite the address.

The transformation contains the indefinite token $1, a literal <@, indefinite token $2, a literal dot (.), the macro D, and the literal >. After the pattern matching, $1 contains kathy.mccafferty and $2 contains peanut . Assume that the macro D was defined elsewhere in the sendmail.cf file as nuts.com . In this case the input address is rewritten as:

kathy.mccafferty<@peanut.nuts.com>

Figure 10.3 illustrates this specific address rewrite. It shows the tokens derived from the input address, and how those tokens are matched against the pattern. It also shows the indefinite tokens produced by the pattern matching, and how the indefinite tokens, and other values from the transformation, are used to produce the rewritten address. After rewriting, the address is again compared to the pattern. This time it fails to match the pattern because it no longer contains exactly one token between the literal <@ and the literal >. So, no further processing is done by this rewrite rule and the address is passed to the next rule in line. Rules in a ruleset are processed sequentially, though a few metasymbols can be used to modify this flow.

Figure 10.3: Rewriting an address

The $> n symbol calls ruleset n and passes the address defined by the remainder of the transformation to ruleset n for processing. For example:

$>9 $1 % $2

This transformation calls ruleset 9 ($>9), and passes the contents of $1, a literal %, and the contents of $2 to ruleset 9 for processing. When ruleset 9 finishes processing, it returns a rewritten address to the calling rule. The returned email address is then compared again to the pattern in the calling rule. If it still matches, ruleset 9 is called again.

The recursion built into rewrite rules creates the possibility for infinite loops. sendmail does its best to detect possible loops, but you should take responsibility for writing rules that don't loop. The $@ and the $: symbols are used to control processing and to prevent loops. If the transformation begins with the $@ symbol, the entire ruleset is terminated and the remainder of the transformation is the value returned by the ruleset. If the transformation begins with the $: symbol, the individual rule is executed only once. Use $: to prevent recursion and to prevent loops when calling other rulesets. Use $@ to exit a ruleset at a specific rule.

The $[ name $] symbol converts a host's nickname or its IP address to its canonical name by passing the value name to the name server for resolution. For example, using the nuts.com name servers, $[goober$] returns peanut.nuts.com and $[[172.16.12.1]$] returns almond.nuts.com .

In the same way that a hostname or address is used to look up a canonical name in the name server database, the $( map key $) syntax uses the key to retrieve information from the database identified by map . This is a more generalized database retrieval syntax than is the one that returns canonical hostnames, and it is more complex to use. Before we get into the details of setting up and using databases from within sendmail, let's finish describing the rest of the syntax of rewrite rules.

There is a special rewrite rule syntax that is used in ruleset 0. Ruleset 0 defines the triple ( mailer, host, user ) that specifies the mail delivery program, the recipient host, and the recipient user.

The special transformation syntax used to do this is:


$#


mailer


$@


host


$:


user

An example of this syntax taken from the linux.smtp.cf sample file is:

R$*<@$*>$*    $#smtp$@$2$:$1<@$2>$3     user@host.domain

Assume the email address david<@filbert.nuts.com> is processed by this rule. The address matches the pattern $*<@$+>$* because:

The address has zero or more tokens (the token david ) that match the first $* symbol.
The address has a literal <@.
The address has zero or more tokens (the five tokens filbert.nuts.com ) that match the requirement of the second $* symbol.
The address has a literal >.
The address has zero or more, in this case zero, tokens that match the requirement of the last $* symbol.

This pattern match produces two indefinite tokens. Indefinite token $1 contains david and $2 contains filbert.nuts.com . No other matches occurred, so $3 is null. These indefinite tokens are used to rewrite the address into the following triple:

$#smtp$@filbert.nuts.com$:david<@filbert.nuts.com>

The components of this triple are:

$#smtp: smtp is the internal name of the mailer that delivers the message.
$@filbert.nuts.com: filbert.nuts.com is the recipient host.
$:david<@filbert.nuts.com>: david<@filbert.nuts.com> is the recipient user.

There is one special variant of this syntax, also used only in ruleset 0, that passes error messages to the user:



$#error$

@

comment

$:

message

The comment field is ignored by sendmail. message is the text of an error message returned to the user, for example:

R<@$+>     $#error$@5.1.1$:"user address required"

This rule returns the message "user address required" if the address matches the pattern.

10.6.2.1 Transforming with a database

External databases can be used to transform addresses in rewrite rules. The database is included in the transformation part of a rule by using the following syntax:



$(



map key

 [

$@



argument

...] [

$:



default

] 

$)

map is the name assigned to the database within the sendmail.cf file. The name assigned to map is not limited by the rules that govern macro names. Like mailer names, map names are only used inside of the sendmail.cf file and can be any name you choose. Select a simple descriptive name, such as "users" or "mailboxes." The map name is assigned with a K command. (More on the K command in a moment.)

key is the value used to index into the database. The value returned from the database for this key is used to rewrite the input address. If no value is returned, the input address is not changed unless a default value is provided.

An argument is an additional value passed to the database procedure along with the key. Multiple arguments can be used, but each argument must start with $@ . The argument can be used by the database procedure to modify the value it returns to sendmail. It is referenced inside the database as % n , where n is a digit that indicates the order in which the argument appears in the rewrite rule - %1, %2, and so on - when multiple arguments are used. (Argument %0 is the key .)

An example will make the use of arguments clear. Assume the following input address:

tom.martin<@sugar>

Further, assume the following database with the internal sendmail name of "relays":

oil     %1<@relay.fats.com>
sugar   %1<@relay.calories.com>
salt    %1<@server.sodium.org>

Finally, assume the following rewrite rule:

R$+<@$->     $(relays $2 $@ $1 $:$1<@$2> $)

The input address tom.martin<@sugar> matches the pattern because it has one or more tokens (tom.martin) before the literal <@ and exactly one token (sugar) after it. The pattern matching creates two indefinite tokens and passes them to the transformation. The transformation calls the database (relays) and passes it token $2 (sugar) as the key and token $1 (tom.martin) as the argument. If the key is not found in the database the default ($1<@$2>) is used. In this case, the key is found in the database. The database program uses the key to retrieve "%1@relay.calories.com", expands the %1 argument, and returns "tom.martin@relay.calories.com" to sendmail, which uses the returned value to replace the input address.

Before a database can be used within sendmail, it must be defined. This is done with the K command. The syntax of the K command is:



K



name type

 [

arguments

]

name is the name used to reference this database within sendmail. In the example above, the name is "relays".

type is the class of database. The type specified in the K command must match the database support complied into your sendmail. Most sendmail programs do not support all database types, but a few basic types are widely supported. Common types are dbm, hash, btree, and nis. There are many more, all of which are described in Appendix E .

arguments are optional. Generally, the only argument is the path of the database file. Occasionally the arguments include flags that are interpreted by the database program. The full list of K command flags that can be passed in the argument field are listed in Appendix E .

To define the "relays" database file used in the example above, we might enter the following command in the sendmail.cf file:

Krelays dbm /usr/local/relays

The name relays is simply a name you chose because it is descriptive. The database type dbm is a type supported by your version of sendmail and was used by you when you built the database file. Finally, the argument /usr/local/relays is the location of the database file you created.

Don't worry if you're confused about how to build and use database files within sendmail. We will revisit this topic later in the chapter and the examples will make the practical use of database files clear.

10.6.3 The Set Ruleset Command

Rulesets are groups of associated rewrite rules that can be referenced by a number. The S command marks the beginning of a ruleset and identifies it with a number. In the S n command syntax, n is the number that identifies the ruleset. Numbers in the range of 0 to 99 are used.

Rulesets can be thought of as subroutines, or functions, designed to process email addresses. They are called from mailer definitions, from individual rewrite rules, or directly by sendmail. Six rulesets have special functions and are called directly by sendmail. These are:

Ruleset 3 is the first ruleset applied to addresses. It converts an address to the canonical form: local-part @ host.domain .

In specific circumstances the @host.domain part is added by sendmail after ruleset 3 terminates. This happens only if the mail has been received from a mailer with the C flag set. [19] In our sample configuration file, none of the mailers use this flag. If the C flag is set, the sender's @host.domain is added to all addresses that have only a local-part . This processing is done after ruleset 3 and before rulesets 1 and 2. (This function is represented in Figure 10.4 by the box marked "D.")

[19] See Appendix E for the full set of mailer flags.
Ruleset 0 is applied to the addresses used to deliver the mail. Ruleset 0 is applied after ruleset 3, and only to the recipient addresses actually used for mail delivery. It resolves the address to the triple ( mailer, host, user ) composed of the name of the mailer that will deliver the mail, the recipient hostname, and the recipient username.
Ruleset 1 is applied to all sender addresses in the message.
Ruleset 2 is applied to all recipient addresses in the message.
Ruleset 4 is applied to all addresses in the message and is used to translate internal address formats into external address formats.
Ruleset 5 is applied to local addresses after sendmail processes the address against the aliases file. Ruleset 5 is only applied to local addresses that do not have an alias.

Figure 10.4 shows the flow of the message and addresses through these rulesets. The D box does not symbolize a ruleset. It is the internal sendmail process described above. The S and R symbols do stand for rulesets. They have numeric names just like all normal rulesets, but the numbers are not fixed as is the case with rulesets 0, 1, 2, 3, 4, and 5. The S and R ruleset numbers are defined in the S and R fields of the mailer definition. Each mailer may specify its own S and R rulesets for mailer-specific cleanup of the sender and recipient addresses just before the message is delivered.

Figure 10.4: Sequence of rulesets

There are, of course, many more rulesets in most sendmail.cf files. The other rulesets provide additional address processing and are called by existing rulesets using the $> n construct. [20] The rulesets provided in any sample sendmail.cf file will be adequate for delivering SMTP mail. It's unlikely you'll have to add to these rulesets, unless you want to add new features to your mailer.

[20] See Table 10-5.