6.3 Deliver to Files
The sendmail program, by means of
aliases or ~/.forward
files, is able to deliver mail directly to files on disk (Section 12.2.2). The form for delivery to files looks like
this:
ident: /path/to/file
Here, mail addressed to the user-id ident at
your site will have each message appended to the file indicated by
the path /path/to/file. This technique is often
used to archive mail messages to mailing lists. A typical application
might look like this:
volleyball: karch, flo, wilt, karolyn, /etc/mail/archives/volleyball.archive
Here, mail sent to the mailing list volleyball,
will be delivered to the four individuals shown. The message will
also be appended to the file volleyball.archive.
The ability to append to files is one of the
sendmail program's strengths.
Delivery to files in the aliases file is a
convenience for the system administrator and mailing lists managers,
but it was never designed or intended to be a robust mechanism for
handling high volumes of message delivery to files.
6.3.1 A Bounce-Mail Handler
Ordinarily, the sendmail
program's ability to write to files works just fine.
But at some sites the file-locking part of that writing can lead to
contention problems. To illustrate, consider the following scenario,
one that is more common than you would expect.
When handling many busy mailing lists, you will soon discover that,
over a surprisingly short period of time, about 10% of the addresses
in those lists will become bad. A user can move from one ISP to
another, or be removed without a forwarding address when leaving a
job. Bad addresses cause bounced messages to be returned to the
envelope-sender (Section 1.5.4).
At small sites, the envelope-sender for each list is usually the
owner of that list. Because volume is small, list maintenance is
easy, but at large sites, especially those that host huge numbers of
mailing lists, maintaining list hygiene can become overwhelming
unless the process is automated.
When setting up a list under sendmail, it is
recommended that the owner of the list have the same address as the
person who administrates the list:
volleyball: :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: volleyball-request
volleyball-request: bob
Here, email sent to the mailing list volleyball
will be forwarded to all addresses listed in the file
volleyball.list. Because an
owner- is specified for this mailing list,
sendmail will automatically set the
envelope-sender for the list to be
volleyball-request, which resolves to be
bob (the mailing list administrator).
Because the bounces are always delivered to the envelope-sender, many
sites (that wish to automate) set up mailing lists so that bounced
mail is delivered to a file:
volleyball: :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: /mail/bounces/bounce.archive
volleyball-request: bob
Here, bounced email messages will be delivered (appended) to the file
bounce.archive, instead of being sent to the
user bob.
The scheme here is to create software that can easily parse returned
bounce messages that will be archived into that file. A simple
perl(1) or sh(1) script can
be written to copy that file, truncate the original, then parse the
copy for bounce information. Each message in the file will be
separated from the others by a line that begins with the
five-character "From
" expression. Within each message the program or
script will find and parse three specific lines:
To: <owner-volleyball@your.host.com>
Final-Recipient: RFC822; nosuchuser@aol.com
Action: failed
Here, the name of the source mailing list is found in the
header-recipient of the bounced message, in the first
To: line. (Naturally additional parsing will be
needed to extract the string volleyball from
that line.)
The Final-Recipient: line (T=) is part of the DSN standard and specifies the
recipient address that bounced. This is the address that needs to be
removed from the file volleyball.list.
The Action: line is part of the DSN standard and
specifies the reason for the bounce. If the reason is other than the
word failed, the address should be retained in
the mailing list rather than removed.
Although we show three lines to parse, there are many lines in a
bounced email message that might be of interest in maintaining
mailing lists—the reason for the bounce, and the date and time
of the bounce. We leave the design of such a parsing program up to
the reader.
The problem is that, although this sort of a bounce-handling scheme
is perfectly fine for small to medium-size sites, interesting things
start to happen as the volume of bounce messages increases. Recall
that bounces in our example are all delivered to a single file,
regardless of which mailing list sent the message:
volleyball: :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: /mail/bounces/bounce.archive note
volleyball-request: bob
baseball: :include:/mail/lists/sports/land/baseball.list
owner-baseball: /mail/bounces/bounce.archive note
baseball-request: scott
When thousands of bounce messages per hour start to be delivered to
this single bounce.archive file, the
sendmail program begins to suffer from exclusive
file open contention. To illustrate, consider the following sequence
of delivery actions:
message 1 opens bounce.archive exclusively
message 2 blocks (sleeps) while waiting for a lock
message 1 writes bounce to file
message 3 blocks (sleeps) while waiting for a lock
message 1 closes bounce.archive
message 2 opens bounce.archive exclusively
message 4 blocks (sleeps) while waiting for a lock
... etc.
Each time a bounce message arrives, the sendmail
program fork(2)s a copy of itself and that copy
attempts to append the bounce message to the file
bounce.archive. To append to a file,
sendmail tries to open that file and exclusively
lock it so that it is the only writer to that file. If
sendmail can't perform an
exclusive lock, it blocks (goes to sleep) until an exclusive lock
becomes available. Clearly, when there are many messages (and many
sendmail programs) contending for exclusive
access to a single file, many sendmail processes
can begin to fill your process table in wait states. At busy sites,
this file contention can lead to bounce mail timing out. When this
happens, the bounce message is itself bounced.
When bounce mail bounces, it is called double-bounce mail and that
second bounce notification is delivered to the address specified in
the DoubleBounceAddress option (DoubleBounceAddress), usually postmaster by
default. Of course, if postmaster is also a file,
there is risk of triple-bounce email! Disposition of triple-bounce
email is determined by the ErrorMode option (ErrorMode).
To avoid file lock contention, design mailing-list bounce files so
that there is one for each list of members. Consider the following
rewrite of the prior aliases file entries:
volleyball: :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: /mail/bounces/sports/land/volleyball.bounce note
volleyball-request: bob
baseball: :include:/mail/lists/sports/land/baseball.list
owner-baseball: /mail/bounces/sports/land/baseball.bounce note
baseball-request: scott
Although this scheme presents a more difficult parsing problem, it
reduces the likelihood of file locking contention. One way to make
this scheme easier to parse is to put all the bounce files in a
single directory, something like this:
owner-volleyball: /mail/bounces/volleyball.bounce
owner-baseball: /mail/bounces/baseball.bounce
This still, however, does not eliminate the locking contention
problem at sites that suffer from many bad addresses and therefore
have unusually high bounce mail delivery rates. For such sites, a
specialized bounce reception machine (or group of such machines) will
prove beneficial.
When a great deal of information about a bounce is not necessary,
consider using the special file /dev/null for
receipt of bounced messages. The sendmail
program recognizes that exclusive locking of
/dev/null is not necessary, so many messages can
be delivered to that special file without lock contention.
Unfortunately, all information about each bounce is lost when using
this strategy, except for the small amount that is normally logged by
sendmail. Specifically, the logs will omit all
information about the original recipient.
One way to preserve the original recipient information, while
delivering bounce messages to /dev/null, is to
encode the recipient address into the envelope sender address when
the message is originally generated. This is possible only when there
is one recipient per envelope. One way to encode, for example, is to
add a copy of the recipient address to the envelope-sender address.
For example, when your mailing list software generates a message to
carol@newcastle.com, it could also create an
envelope-sender that looks like this:
bounce+carol++newcastle+com@bouncer.your.domain
When such an address is bounced back to your site, the message
would be delivered to /dev/null, and the log
file for sendmail would contain in part an
expression such as this:
to=/dev/null, ctladdr=<bounce+carol++newcastle+com@bouncer.your.domain>
We won't go into detail about how to customize your
envelope-sender to include recipient information. There are just too
many forms of mailing list software in existence, so we must leave
such a solution to the reader.
Note that this technique considers all bounce+
messages to be a bounced message. In the real world, temporary
failures and vacation-style messages are sent to
this address, and there is nothing in the syslog
record to differentiate the two.
6.3.2 Handle Bounces by Discarding Them
Depending on the purpose of your bounce email, you might wish to
discard that mail rather than deliver it. Consider a business that
routinely sends email to its customers. Because all the email is
generated by programs, it is easy to regenerate any message and
resend it if necessary. For such situations, there is no need to
capture actual bounced mail messages because all that information can
be automatically re-created. Instead, all that needs to be captured
is the email address that bounced, and some identifier to identify
the particular message that bounced.
In the previous section we suggested a way to customize
envelope-sender addresses. Here, we use that same form of addressing
to send the routine business email mentioned earlier. Consider, for
example, the following envelope-sender address:
<bounce+00004561+a5621b@bouncer.your.domain>
Here, the envelope-sender address is in the plussed-user format
supported by sendmail. The first part of the
three plus-separated parts is bounce, which helps
to differentiate bounce from real email. The second of the three
parts is the user identifier (00004561) so that we
can tie this message back to a particular user. The third part is the
message code (the a5621b) which indicates exactly
which message was created for this user. The three parts are followed
by the hostname of the bounce-handling machine.
When email is sent with this envelope-sender address, the recipient
will receive a message, the header portion of which will look in part
like this:
From bounce+00004561+a5621b@bouncer.your.domain Fri Dec 13 14:31:23 2002
From: "Your daily mail service" <yourmail@your.business.com>
To: "Customer Name" <customer@customer.site.com>
The first line, the five character
"From " header,
shows the envelope sender. The second line, the
From: header, shows the header sender, formatted
in such a fashion as to be easily read by the recipient. The last
line, the To: header, shows the
recipient's address, the address of the customer.
A message such as this can bounce for a number of reasons. Perhaps
the customer moved to a new ISP and the old ISP removed the account.
In such an instance, the message would bounce during the SMTP session
something like this:
550 5.1.1 <customer@customer.site.com>... User unknown
If sendmail was sending this message from the
queue, rather than interactively with someone's mail
reading program, a bounce message will be sent to the envelope
sender:
bounce+00004561+a5621b@bouncer.your.domain
On the bounce-handling machine, the recipient address will always
contain the original information that was needed to identify or
re-create the original message.
The next step is to modify the configuration file on the
bounce-handling machine so that all received bounce messages are
discarded and logged, rather than delivered to files. The rules
needed by your mc configuration file to
accomplish this feat look like this:
LOCAL_RULESETS
SLocal_check_rcpt
R $* $: $>canonify $1 focus on host
R $* <@ $=w . > $* $#discard $:discard
This declares a new rule set called
Local_check_rcpt (Section 7.1.3)
that will be called to check every envelope-recipient address. The
first rule (R line) causes everything (the
$* on the lefthand side) to be sent through the
canonify rule set (the
$>canonify) to isolate the host part. That host
part is passed to the second rule isolated inside angle braces. The
$=w ($=w) class contains
a list of all the names that the local host is known by. This second
rule essentially says that any mail to any recipient at the local
host will be discarded (the $#discard).
When a message is discarded with the
Local_check_rcpt rule set,
sendmail logs information about the recipient.
Such logs for the hypothetical message described earlier, might look
like this:
Mar 5 07:50:42 bounce sendmail[6376]: g25EntTh006376: ruleset=check_rcpt,
arg1=<bounce+00004561+a5621b@bouncer.your.domain>, relay=bounce@localhost, discard
Here, the log contains a line that tells us the
recipient's message was discarded. Because the
recipient address is shown with arg1=, we can
easily determine from the codes enclosed in the plus sign who the
original recipient was, and which message was sent to that original
recipient.
Note that the aforementioned rule set is just a jumping-off point
because it only discards mail to the local host, and relays all other
mail to other hosts. You might want to modify the second rule
something like the following:
R bounce + $* <@ $=w . > $* $#discard $:discard
This modification of our prior rule now only discards mail addressed
to bounce+anything on the
local host. Mail to any other local user will be delivered normally.
Note that, for this scheme to work, there must actually be a user
named bounce on the machine, or there must be a
valid entry in the aliases database for that
name. If you use an invalid name, the bounce will itself bounce.
Also note that the Timeout.queuewarn option (See this section) should be set to zero on the sending machine,
to prevent the bounce machine from receiving spurious notices that
mail could not be delivered within a few hours.
Finally, note that all information about why the message bounced is
lost. If the reason for the bounce is important, you should not use
this discard scheme. Instead you should either deliver bounced
messages to files for later parsing as we showed in the previous
section, or you should write your own bounce handler so that the
bounce mail can be screened in real time.
6.3.3 An Email Blackhole
The bounce discard handler in the previous section forms the
underpinnings of a useful tool for judging your performance. The idea
is to create a configuration file that causes
sendmail to accept everything thrown at it, and
to discard everything it gets.
With such a blackhole site in place, you can use the
ConnectOnlyTo option (ConnectOnlyTo) to cause the testing machine to send all its
email to the blackhole machine. If that blackhole machine had the IP
address 123.45.67.89, such an option declaration
on the testing machine might look like this:
define(`confCONNECT_ONLY_TO', `123.45.67.89')
You create a blackhole machine by setting up a brief
mc file that performs a few simple tricks.
Assuming the blackhole machine is running the FreeBSD version of
Unix, such an mc file might look like this:
OSTYPE(`freebsd4')
define(`confREFUSE_LA', 1000)
define(`confQUEUE_LA', 1000)
define(`confSAFE_QUEUE',`False')
define(`confDF_BUFFER_SIZE', `1000000')
define(`confLOG_LEVEL', 0)
MAILER(`local')
LOCAL_RULE_0
R $* $: nobody<@localhost.>
LOCAL_RULESETS
SLocal_check_rcpt
R $* $#discard $:discard
The first line defines the operating system (FreeBSD 4.x). The next
five lines define values for five key options.
The confREFUSE_LA (RefuseLA)
line sets the RefuseLA option to 1000, a value
sufficiently high to ensure that mail will never be refused,
regardless of how high the load average is.
The confQUEUE_LA (QueueLA)
line sets the QueueLA option to 1000, a value
sufficiently high to ensure that mail will never be queued because of
a high load average.
The confSAFE_QUEUE (SuperSafe)
line sets the SuperSafe option to false. This
prevents sendmail from queuing all messages for
safety.
The confDF_BUFFER_SIZE (DataFileBufferSize) line sets the
DataFileBufferSize option to a value of 1 million,
preventing messages that are smaller than 1 megabyte from being
queued.
The confLOG_LEVEL (LogLevel)
line sets the LogLevel option to zero, telling
sendmail to do minimal logging.
The last few lines in this small mc file are the
ones that actually create a blackhole machine:
MAILER(`local')
LOCAL_RULE_0
R $* $: nobody<@localhost.>
LOCAL_RULESETS
SLocal_check_rcpt
R $* $#discard $:discard
The MAILER( ) directive ensures that there will be a delivery agent
that can handle the inbound email. We use it to add the
local delivery agent which will be selected to
deliver the message to nobody.
The LOCAL_RULE_0 directive arranges for a new rule to be added to the
parse rule set 0 (Section 19.5),
which selects delivery agents. The lone rule that we add tells
sendmail to change anything (the
$*) into the address
nobody@localhost. The angle braces around part
of that address, and the dot following localhost,
are needed because of the way other rules in the
parse rule set 0 work. The address
nobody@localhost is an address that is always
local. Any user that is always local can be used here, such as
root, bin, or a made-up
account such as bob. Also, in place of
localhost, you can use the actual name of the
machine.
The LOCAL_RULESETS directive arranges for a new rule set to be added.
This is the same Local_check_rcpt rule set we
described in the previous section—it screens the address
supplied to the RCPT SMTP command. The single rule in that rule set
tells sendmail to discard (the
$#discard) all recipient addresses (the
$*).
The blackhole machine that results from this configuration file will
accept and discard all email that has valid addresses. It emulates a
real MTA in that it will reject and bounce illegal addresses, as the
following shows:
mail from: <bob@no.such.host>
553 5.1.8 <bob@no.such.host>... Domain of sender address bob@no.such.host does not exist
mail from: <@foo.com>
553 5.1.3 <@foo.com>... User address required
This blackhole machine will also reject badly formed recipient
addresses:
rcpt to: <@foo.com>
553 5.1.3 <@foo.com>... User address required
rcpt to: <bob@>
553 5.1.3 <bob@>... Hostname required
rcpt to: <bob@foo.com
553 5.0.0 <bob@foo.com... Unbalanced '<'
If you wish the blackhole machine to be more forgiving about
addresses that it accepts, you can add the following feature to your
mc file:
FEATURE(`accept_unresolvable_domains')
This feature allows sendmail to accept in
addresses domains that do not exist. The address
bob@no.such.host (see earlier example) is one
example of such an address.
Note that this blackhole machine only accepts all email. It does not
try to emulate the Internet. All mail is accepted at the same speed
regardless of the addresses contained. Mail to a fast domain and mail
to a slow domain are all handled with the same speed. If you want a
blackhole program that models the Internet, you will need to write
your own.
Note that you might also wish to modify the
sendmail binary (and not just the configuration
file as we have done) to operate better as a blackhole. Consider
adding the following to the m4
Build file that you used to build your
sendmail program:
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXBADCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXETRNCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXHELOCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXNOOPCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXTIMEOUT=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXVRFYCOMMANDS=0')
These disable various internal pieces of code that cause
sendmail to slow itself down when too many bad
SMTP commands are received.
Finally, note that, with this blackhole approach, internally
generated addresses such as mail to root will be
sent to nobody. Clearly, on the blackhole
machine, there should be no user accounts, and
nobody should be aliased to
root so that local mail will go to a real
account.
If it is important for the blackhole machine to function as a usable
machine too, consider running the blackhole version of
sendmail so that it listens on a port other than
25. One way to do this is by adding two simple
m4 macros to your mc
configuration file:
FEATURE(`no_default_msa')
DAEMON_OPTIONS(`Port=4048, Name=BlackHole')
Here, we tell the blackhole daemon to not listen on ports 25 or 587
(the no_default_msa feature), and to listen on
port 4048 (the Port=4048). You would then run a
regular sendmail, as usual, to handle nontest
email. Unfortunately the ConnectOnlyTo option
(ConnectOnlyTo) does not take a port number for an
argument, so it cannot be used with a blackhole machine, when the
blackhole machine listens on a port number other than 25. To use the
blackhole machine on another port, you must declare the
ConnectOnlyTo option and change the
A= equate for all the smtp
delivery agents:
define(`confCONNECT_ONLY_TO', `123.45.67.89')
define(`SMTP_MAILER_ARGS', `TPC $h 4048')
define(`SMTP8_MAILER_ARGS', `TPC $h 4048')
define(`ESMTP_MAILER_ARGS', `TPC $h 4048')
define(`DSMTP_MAILER_ARGS', `TPC $h 4048')
define(`RELAY_MAILER_ARGS', `TPC $h 4048')
These lines tell the sending machine that for all SMTP mail, it
should connect to port 4048 on the blackhole machine.
|