6.3 Deliver to Files

The sendmail program, by means of aliases or ~/.forward files, is able to deliver mail directly to files on disk (Section 12.2.2). The form for delivery to files looks like this:

ident: /path/to/file

Here, mail addressed to the user-id ident at your site will have each message appended to the file indicated by the path /path/to/file. This technique is often used to archive mail messages to mailing lists. A typical application might look like this:

volleyball: karch, flo, wilt, karolyn, /etc/mail/archives/volleyball.archive

Here, mail sent to the mailing list volleyball, will be delivered to the four individuals shown. The message will also be appended to the file volleyball.archive.

The ability to append to files is one of the sendmail program's strengths. Delivery to files in the aliases file is a convenience for the system administrator and mailing lists managers, but it was never designed or intended to be a robust mechanism for handling high volumes of message delivery to files.

6.3.1 A Bounce-Mail Handler

Ordinarily, the sendmail program's ability to write to files works just fine. But at some sites the file-locking part of that writing can lead to contention problems. To illustrate, consider the following scenario, one that is more common than you would expect.

When handling many busy mailing lists, you will soon discover that, over a surprisingly short period of time, about 10% of the addresses in those lists will become bad. A user can move from one ISP to another, or be removed without a forwarding address when leaving a job. Bad addresses cause bounced messages to be returned to the envelope-sender (Section 1.5.4).

At small sites, the envelope-sender for each list is usually the owner of that list. Because volume is small, list maintenance is easy, but at large sites, especially those that host huge numbers of mailing lists, maintaining list hygiene can become overwhelming unless the process is automated.

When setting up a list under sendmail, it is recommended that the owner of the list have the same address as the person who administrates the list:

volleyball:   :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: volleyball-request
volleyball-request: bob

Here, email sent to the mailing list volleyball will be forwarded to all addresses listed in the file volleyball.list. Because an owner- is specified for this mailing list, sendmail will automatically set the envelope-sender for the list to be volleyball-request, which resolves to be bob (the mailing list administrator).

Because the bounces are always delivered to the envelope-sender, many sites (that wish to automate) set up mailing lists so that bounced mail is delivered to a file:

volleyball:   :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: /mail/bounces/bounce.archive
volleyball-request: bob

Here, bounced email messages will be delivered (appended) to the file bounce.archive, instead of being sent to the user bob.

The scheme here is to create software that can easily parse returned bounce messages that will be archived into that file. A simple perl(1) or sh(1) script can be written to copy that file, truncate the original, then parse the copy for bounce information. Each message in the file will be separated from the others by a line that begins with the five-character "From " expression. Within each message the program or script will find and parse three specific lines:

To: <owner-volleyball@your.host.com>
Final-Recipient: RFC822; nosuchuser@aol.com
Action: failed

Here, the name of the source mailing list is found in the header-recipient of the bounced message, in the first To: line. (Naturally additional parsing will be needed to extract the string volleyball from that line.)

The Final-Recipient: line (T=) is part of the DSN standard and specifies the recipient address that bounced. This is the address that needs to be removed from the file volleyball.list.^[10]

^[10] Note that, if a bad address is a result of a user setting up a defective ~/.forward file, you might have wrongly removed that user's address, when instead you should have found some way to notify that user of the mistake.

The Action: line is part of the DSN standard and specifies the reason for the bounce. If the reason is other than the word failed, the address should be retained in the mailing list rather than removed.

Although we show three lines to parse, there are many lines in a bounced email message that might be of interest in maintaining mailing lists—the reason for the bounce, and the date and time of the bounce. We leave the design of such a parsing program up to the reader.

The problem is that, although this sort of a bounce-handling scheme is perfectly fine for small to medium-size sites, interesting things start to happen as the volume of bounce messages increases. Recall that bounces in our example are all delivered to a single file, regardless of which mailing list sent the message:

volleyball:   :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: /mail/bounces/bounce.archive        note
volleyball-request: bob

baseball:   :include:/mail/lists/sports/land/baseball.list
owner-baseball: /mail/bounces/bounce.archive          note
baseball-request: scott

When thousands of bounce messages per hour start to be delivered to this single bounce.archive file, the sendmail program begins to suffer from exclusive file open contention. To illustrate, consider the following sequence of delivery actions:

message 1 opens bounce.archive exclusively
message 2 blocks (sleeps) while waiting for a lock
message 1 writes bounce to file
message 3 blocks (sleeps) while waiting for a lock
message 1 closes bounce.archive
message 2 opens bounce.archive exclusively
message 4 blocks (sleeps) while waiting for a lock
... etc.

Each time a bounce message arrives, the sendmail program fork(2)s a copy of itself and that copy attempts to append the bounce message to the file bounce.archive. To append to a file, sendmail tries to open that file and exclusively lock it so that it is the only writer to that file. If sendmail can't perform an exclusive lock, it blocks (goes to sleep) until an exclusive lock becomes available. Clearly, when there are many messages (and many sendmail programs) contending for exclusive access to a single file, many sendmail processes can begin to fill your process table in wait states. At busy sites, this file contention can lead to bounce mail timing out. When this happens, the bounce message is itself bounced.

When bounce mail bounces, it is called double-bounce mail and that second bounce notification is delivered to the address specified in the DoubleBounceAddress option (DoubleBounceAddress), usually postmaster by default. Of course, if postmaster is also a file, there is risk of triple-bounce email! Disposition of triple-bounce email is determined by the ErrorMode option (ErrorMode).

To avoid file lock contention, design mailing-list bounce files so that there is one for each list of members. Consider the following rewrite of the prior aliases file entries:

volleyball:   :include:/mail/lists/sports/land/volleyball.list
owner-volleyball: /mail/bounces/sports/land/volleyball.bounce       note
volleyball-request: bob

baseball:   :include:/mail/lists/sports/land/baseball.list
owner-baseball: /mail/bounces/sports/land/baseball.bounce           note
baseball-request: scott

Although this scheme presents a more difficult parsing problem, it reduces the likelihood of file locking contention. One way to make this scheme easier to parse is to put all the bounce files in a single directory, something like this:

owner-volleyball: /mail/bounces/volleyball.bounce
owner-baseball:   /mail/bounces/baseball.bounce

This still, however, does not eliminate the locking contention problem at sites that suffer from many bad addresses and therefore have unusually high bounce mail delivery rates. For such sites, a specialized bounce reception machine (or group of such machines) will prove beneficial.

When a great deal of information about a bounce is not necessary, consider using the special file /dev/null for receipt of bounced messages. The sendmail program recognizes that exclusive locking of /dev/null is not necessary, so many messages can be delivered to that special file without lock contention. Unfortunately, all information about each bounce is lost when using this strategy, except for the small amount that is normally logged by sendmail. Specifically, the logs will omit all information about the original recipient.

One way to preserve the original recipient information, while delivering bounce messages to /dev/null, is to encode the recipient address into the envelope sender address when the message is originally generated. This is possible only when there is one recipient per envelope. One way to encode, for example, is to add a copy of the recipient address to the envelope-sender address. For example, when your mailing list software generates a message to carol@newcastle.com, it could also create an envelope-sender that looks like this:

bounce+carol++newcastle+com@bouncer.your.domain

When such an address^[11] is bounced back to your site, the message would be delivered to /dev/null, and the log file for sendmail would contain in part an expression such as this:

^[11] In 1997, Dan Bernstein coined the acronym VERP (Variable Envelope Return Paths) to describe this type of envelope address (http://cr.yp.to/proto/verp.txt). One of the authors instituted this form of envelope address in 1995, but gave it no name.

to=/dev/null, ctladdr=<bounce+carol++newcastle+com@bouncer.your.domain>

We won't go into detail about how to customize your envelope-sender to include recipient information. There are just too many forms of mailing list software in existence, so we must leave such a solution to the reader.

Note that this technique considers all bounce+ messages to be a bounced message. In the real world, temporary failures and vacation-style messages are sent to this address, and there is nothing in the syslog record to differentiate the two.

6.3.2 Handle Bounces by Discarding Them

Depending on the purpose of your bounce email, you might wish to discard that mail rather than deliver it. Consider a business that routinely sends email to its customers. Because all the email is generated by programs, it is easy to regenerate any message and resend it if necessary. For such situations, there is no need to capture actual bounced mail messages because all that information can be automatically re-created. Instead, all that needs to be captured is the email address that bounced, and some identifier to identify the particular message that bounced.

In the previous section we suggested a way to customize envelope-sender addresses. Here, we use that same form of addressing to send the routine business email mentioned earlier. Consider, for example, the following envelope-sender address:

<bounce+00004561+a5621b@bouncer.your.domain>

Here, the envelope-sender address is in the plussed-user format supported by sendmail. The first part of the three plus-separated parts is bounce, which helps to differentiate bounce from real email. The second of the three parts is the user identifier (00004561) so that we can tie this message back to a particular user. The third part is the message code (the a5621b) which indicates exactly which message was created for this user. The three parts are followed by the hostname of the bounce-handling machine.

When email is sent with this envelope-sender address, the recipient will receive a message, the header portion of which will look in part like this:

From bounce+00004561+a5621b@bouncer.your.domain Fri Dec 13 14:31:23 2002
From: "Your daily mail service" <yourmail@your.business.com>
To: "Customer Name" <customer@customer.site.com>

The first line, the five character "From " header, shows the envelope sender. The second line, the From: header, shows the header sender, formatted in such a fashion as to be easily read by the recipient. The last line, the To: header, shows the recipient's address, the address of the customer.

A message such as this can bounce for a number of reasons. Perhaps the customer moved to a new ISP and the old ISP removed the account. In such an instance, the message would bounce during the SMTP session something like this:

550 5.1.1 <customer@customer.site.com>... User unknown

If sendmail was sending this message from the queue, rather than interactively with someone's mail reading program, a bounce message will be sent to the envelope sender:

bounce+00004561+a5621b@bouncer.your.domain

On the bounce-handling machine, the recipient address will always contain the original information that was needed to identify or re-create the original message.

The next step is to modify the configuration file on the bounce-handling machine so that all received bounce messages are discarded and logged, rather than delivered to files. The rules needed by your mc configuration file to accomplish this feat look like this:

LOCAL_RULESETS
SLocal_check_rcpt
R $*                     $: $>canonify $1        focus on host
R $* <@ $=w . > $*       $#discard $:discard

This declares a new rule set called Local_check_rcpt (Section 7.1.3) that will be called to check every envelope-recipient address. The first rule (R line) causes everything (the $* on the lefthand side) to be sent through the canonify rule set (the $>canonify) to isolate the host part. That host part is passed to the second rule isolated inside angle braces. The $=w ($=w) class contains a list of all the names that the local host is known by. This second rule essentially says that any mail to any recipient at the local host will be discarded (the $#discard).

When a message is discarded with the Local_check_rcpt rule set, sendmail logs information about the recipient. Such logs for the hypothetical message described earlier, might look like this:

Mar  5 07:50:42 bounce sendmail[6376]: g25EntTh006376: ruleset=check_rcpt, 
arg1=<bounce+00004561+a5621b@bouncer.your.domain>, relay=bounce@localhost, discard

Here, the log contains a line that tells us the recipient's message was discarded. Because the recipient address is shown with arg1=, we can easily determine from the codes enclosed in the plus sign who the original recipient was, and which message was sent to that original recipient.

Note that the aforementioned rule set is just a jumping-off point because it only discards mail to the local host, and relays all other mail to other hosts. You might want to modify the second rule something like the following:

R bounce + $* <@ $=w . > $*       $#discard $:discard

This modification of our prior rule now only discards mail addressed to bounce+anything on the local host. Mail to any other local user will be delivered normally.

Note that, for this scheme to work, there must actually be a user named bounce on the machine, or there must be a valid entry in the aliases database for that name. If you use an invalid name, the bounce will itself bounce.

Also note that the Timeout.queuewarn option (See this section) should be set to zero on the sending machine, to prevent the bounce machine from receiving spurious notices that mail could not be delivered within a few hours.

Finally, note that all information about why the message bounced is lost. If the reason for the bounce is important, you should not use this discard scheme. Instead you should either deliver bounced messages to files for later parsing as we showed in the previous section, or you should write your own bounce handler so that the bounce mail can be screened in real time.

6.3.3 An Email Blackhole

The bounce discard handler in the previous section forms the underpinnings of a useful tool for judging your performance. The idea is to create a configuration file that causes sendmail to accept everything thrown at it, and to discard everything it gets.

With such a blackhole site in place, you can use the ConnectOnlyTo option (ConnectOnlyTo) to cause the testing machine to send all its email to the blackhole machine. If that blackhole machine had the IP address 123.45.67.89, such an option declaration on the testing machine might look like this:

define(`confCONNECT_ONLY_TO', `123.45.67.89')

You create a blackhole machine by setting up a brief mc file that performs a few simple tricks. Assuming the blackhole machine is running the FreeBSD version of Unix, such an mc file might look like this:

OSTYPE(`freebsd4')
define(`confREFUSE_LA',  1000)
define(`confQUEUE_LA',   1000)
define(`confSAFE_QUEUE',`False')
define(`confDF_BUFFER_SIZE', `1000000')
define(`confLOG_LEVEL',  0)
MAILER(`local')
LOCAL_RULE_0
R $*            $: nobody<@localhost.>
LOCAL_RULESETS
SLocal_check_rcpt
R $*            $#discard $:discard

The first line defines the operating system (FreeBSD 4.x). The next five lines define values for five key options.

The confREFUSE_LA (RefuseLA) line sets the RefuseLA option to 1000, a value sufficiently high to ensure that mail will never be refused, regardless of how high the load average is.

The confQUEUE_LA (QueueLA) line sets the QueueLA option to 1000, a value sufficiently high to ensure that mail will never be queued because of a high load average.

The confSAFE_QUEUE (SuperSafe) line sets the SuperSafe option to false. This prevents sendmail from queuing all messages for safety.

The confDF_BUFFER_SIZE (DataFileBufferSize) line sets the DataFileBufferSize option to a value of 1 million, preventing messages that are smaller than 1 megabyte from being queued.

The confLOG_LEVEL (LogLevel) line sets the LogLevel option to zero, telling sendmail to do minimal logging.

The last few lines in this small mc file are the ones that actually create a blackhole machine:

MAILER(`local')
LOCAL_RULE_0
R $*            $: nobody<@localhost.>
LOCAL_RULESETS
SLocal_check_rcpt
R $*            $#discard $:discard

The MAILER( ) directive ensures that there will be a delivery agent that can handle the inbound email. We use it to add the local delivery agent which will be selected to deliver the message to nobody.

The LOCAL_RULE_0 directive arranges for a new rule to be added to the parse rule set 0 (Section 19.5), which selects delivery agents. The lone rule that we add tells sendmail to change anything (the $*) into the address nobody@localhost. The angle braces around part of that address, and the dot following localhost, are needed because of the way other rules in the parse rule set 0 work. The address nobody@localhost is an address that is always local. Any user that is always local can be used here, such as root, bin, or a made-up account such as bob. Also, in place of localhost, you can use the actual name of the machine.

The LOCAL_RULESETS directive arranges for a new rule set to be added. This is the same Local_check_rcpt rule set we described in the previous section—it screens the address supplied to the RCPT SMTP command. The single rule in that rule set tells sendmail to discard (the $#discard) all recipient addresses (the $*).

The blackhole machine that results from this configuration file will accept and discard all email that has valid addresses. It emulates a real MTA in that it will reject and bounce illegal addresses, as the following shows:

mail from: <bob@no.such.host>
553 5.1.8 <bob@no.such.host>... Domain of sender address bob@no.such.host does not exist
mail from: <@foo.com>
553 5.1.3 <@foo.com>... User address required

This blackhole machine will also reject badly formed recipient addresses:

rcpt to: <@foo.com>
553 5.1.3 <@foo.com>... User address required
rcpt to: <bob@>
553 5.1.3 <bob@>... Hostname required
rcpt to: <bob@foo.com
553 5.0.0 <bob@foo.com... Unbalanced '<'

If you wish the blackhole machine to be more forgiving about addresses that it accepts, you can add the following feature to your mc file:

FEATURE(`accept_unresolvable_domains')

This feature allows sendmail to accept in addresses domains that do not exist. The address bob@no.such.host (see earlier example) is one example of such an address.

Note that this blackhole machine only accepts all email. It does not try to emulate the Internet. All mail is accepted at the same speed regardless of the addresses contained. Mail to a fast domain and mail to a slow domain are all handled with the same speed. If you want a blackhole program that models the Internet, you will need to write your own.

Note that you might also wish to modify the sendmail binary (and not just the configuration file as we have done) to operate better as a blackhole. Consider adding the following to the m4 Build file that you used to build your sendmail program:

APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXBADCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXETRNCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXHELOCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXNOOPCOMMANDS=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXTIMEOUT=0')
APPENDDEF(`conf_sendmail_ENVDEF', `-DMAXVRFYCOMMANDS=0')

These disable various internal pieces of code that cause sendmail to slow itself down when too many bad SMTP commands are received.

Finally, note that, with this blackhole approach, internally generated addresses such as mail to root will be sent to nobody. Clearly, on the blackhole machine, there should be no user accounts, and nobody should be aliased to root so that local mail will go to a real account.

If it is important for the blackhole machine to function as a usable machine too, consider running the blackhole version of sendmail so that it listens on a port other than 25. One way to do this is by adding two simple m4 macros to your mc configuration file:

FEATURE(`no_default_msa')
DAEMON_OPTIONS(`Port=4048, Name=BlackHole')

Here, we tell the blackhole daemon to not listen on ports 25 or 587 (the no_default_msa feature), and to listen on port 4048 (the Port=4048). You would then run a regular sendmail, as usual, to handle nontest email. Unfortunately the ConnectOnlyTo option (ConnectOnlyTo) does not take a port number for an argument, so it cannot be used with a blackhole machine, when the blackhole machine listens on a port number other than 25. To use the blackhole machine on another port, you must declare the ConnectOnlyTo option and change the A= equate for all the smtp delivery agents:

define(`confCONNECT_ONLY_TO', `123.45.67.89')
define(`SMTP_MAILER_ARGS', `TPC $h 4048')
define(`SMTP8_MAILER_ARGS', `TPC $h 4048')
define(`ESMTP_MAILER_ARGS', `TPC $h 4048')
define(`DSMTP_MAILER_ARGS', `TPC $h 4048')
define(`RELAY_MAILER_ARGS', `TPC $h 4048')

These lines tell the sending machine that for all SMTP mail, it should connect to port 4048 on the blackhole machine.