19.3 The canonify Rule Set 3
The canonify rule set 3 is the first to process
every address. Beginning with V8.10 sendmail,
that rule set is declared like this:
Scanonify=3
The name canonify gives a clue to its role, that
of putting all addresses into focused or canonical form.
The canonify rule set 3 puts each address it gets
into a form that simplifies the tasks of other rule sets. The most
common method is to have the canonify rule set 3
focus an address (place angle brackets around
the host part). Then later rules don't have to
search for the host part because it is already highlighted. For
example, consider trying to spot the recipient host in this mess:
uuhost!user%host1%host2
Here, user is eventually intended to receive the
mail message on the host uuhost. But where should
sendmail send the message first? As it happens,
sendmail selects uuhost
(unless it is uuhost). Focusing on this address
therefore results in the following:
user%host1%host2<@uuhost.uucp>
Note that uuhost was moved to the end, the
! was changed to an @, and
.uucp was appended. The @ is
there so that all focused parts uniformly contain an
@ just before the targeted host. Later, when we
take up post-processing, we'll show how
final rule set 4 moves the
uuhost back to the beginning and restores the
!.
In actual practice, the role of the canonify rule
set 3 is much more complex than this example. In addition to
focusing, it must handle list-syntax addresses (ColonOkInAddr), missing and malformed addresses, the
% hack (Section 7.4.2), and more.
See LOCAL_RULE_3 (Section 4.3.3.4) for a way to add
rules to the canonify rule set 3.
19.3.1 A Special Case: From:<>
Among the rules in a typical canonify rule set 3
are those that handle empty addresses. These represent the special
case of an empty or nonexistent address. Empty addresses should be
turned into the address of the pseudo-user that bounces mail,
MAILER-DAEMON:
R $@ $@ < @ > empty becomes special
Here, an empty address is rewritten to be a lone @
surrounded by angle braces. Other rules sets later turn this special
token into $n (which contains MAILER-DAEMON as its
value).
19.3.2 Basic Textual Canonicalization
Addresses can be legally expressed in a variety of formats:
address
address (full name)
<address>
full name <address>
list:members;
When sendmail preprocesses an address that is in
the third and forth formats, it needs to find the address inside an
arbitrarily deep nesting of angle braces. For example, where is the
address in all this?
Full Name <x12<@zy<alt=bob@r.com<bob@r.net>r.r.net>#5>+>
The rules in a typical canonify rule set 3 will
quickly cut through all this and focus on the actual address:
R $* $: < $1 > housekeeping <>
R $+ < $* > < $2 > strip excess on left
R < $* > $+ < $1 > strip excess on right
Here, the first rule puts angle braces around everything so that the
next two rules will still work, even if the original address had no
angle braces. The second rule essentially looks for the leftmost
< character and throws away everything to the
left of that. Because rules are recursive, it does that until there
is only one < left. The third rule completes
the process by looking for the rightmost > and discarding
everything after that.
You can witness this process by running sendmail
in -bt rule-testing mode, using something such as
the following. Note that some of the lines that
sendmail outputs are wrapped to fit the page:
% /usr/sbin/sendmail -bt
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>
> -d21.12
> canonify Full Name <x12<@zy<alt=bob@r.com<bob@r.netr.r.net>#5>+> >
... some other rules here
-----trying rule: $*
-----rule matches: $: < $1 >
rewritten as: < Full Name < x12 < @ zy < alt=bob @ r . com < bob @ your . domain >
relay . domain > #5 > + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < x12 < @ zy < alt=bob @ r . com < bob @ your . domain > relay . domain
> #5 > + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < @ zy < alt=bob @ r . com < bob @ your . domain > relay . domain > #5
> + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < alt=bob @ r . com < bob @ your . domain > relay . domain > #5 > + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < bob @ your . domain > relay . domain > #5 > + > >
-----trying rule: $+ < $* >
----- rule fails
-----trying rule: < $* > $+
-----rule matches: < $1 >
rewritten as: < bob @ your . domain >
Notice that we first put sendmail into debugging
mode so that we can watch the rules at work. Then we feed in the
canonify rule set 3 followed by the address that
was such a mess earlier in this section. The three rules we showed
you do their job and isolate the real address from all the other
nonaddress pieces of information.
19.3.3 Handling Routing Addresses
Beginning with V8.10,
sendmail removes route
addresses by default, unless the
DontPruneRoutes option (DontPruneRoutes) is set to true.
Route addresses are addresses in the form:
@A,@B:user@C
Here, mail should be sent first to A, then from
A to B, and finally from
B to C.
19.3.4 Handling Specialty Addresses
A whole book is dedicated to the myriad forms of addressing that
might face a site administrator: !%@:: A Directory of
Electronic Mail Addressing & Networks by
Donnalyn Frey and Rick Adams
(O'Reilly & Associates, 1993). We
won't duplicate that work here. Rather, we point out
that most such addresses are handled nicely by existing configuration
files. Consider the format of a DECnet address:
host::user
The best approach to handling such an address in the
canonify rule set 3 is to convert it into the
Internet user@host.domain form:
R $+ :: $+ $@ $2 @ $1.decnet
Here, we reverse the host and
user and put them into Internet form. The
.decnet can later be used by the
parse rule set 0 to select an appropriate delivery
agent.
This is a simple example of a special address problem from the many
that can develop. In addition to DECnet, for example, your site might
have to deal with Xerox Grapevine addresses,
X.400 addresses, or UUCP addresses. The best way to handle such
addresses is to copy what others have done.
19.3.5 Focusing for @ Syntax
The
last few rules in our illustration of a typical
canonify rule set 3 are used to process the
Internet-style user@domain address:
# find focus for @ syntax addresses
R $+ @ $+ $: $1 <@ $2> focus on domain
R $+ < $+ @ $+ > $1 $2 <@ $3> move gaze right
R $+ <@ $+ > $@ $1 <@ $2> already focused
For an address such as something@something, the
first rule focuses on all the tokens following the first
@ as the name of the host. Recall that the
$: prefix to the righthand side (RHS) prevents
potentially infinite recursion.
Assuming that the workspace started with:
user@host
these rules will rewrite that address to focus on the host part and
become:
user<@host>
Any address that has not been handled by the
canonify rule set 3 is unchanged and probably not
focused. Because the parse rule set 0 expects all
addresses to be focused so that it can select appropriate delivery
agents, such unfocused addresses can bounce. Many configuration files
allow local addresses (just a username) to be unfocused.
|