6.2 Sidestep Slow Hosts
A slow host is one that requires more than a few seconds to accept
delivery of a modestly sized email message. To illustrate, consider
the following example produced by a verbose transaction of sending
email to such a slow host:
% /usr/sbin/sendmail -v -Rslowhost.com -q
Running /var/spool/mqueues/q.2/df/f0DHnvO02567 (sequence 1 of 1)
bob@slowhost.com... Connecting to mx.slowhost.com. via esmtp...
220 mx.slowhost.com ESMTP Sendmail 8.10.1/8.10.1; Fri, 13 Dec 2002 10:50:20 -0700
(MST)
>>> EHLO mx.slowhost.com
250-mx.slowhost.com Hello you@yourhost.com [123.45.678.9], pleased to meet you
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-SIZE
250-DSN
250-ONEX
250-ETRN
250-XUSR
250 HELP
>>> MAIL From:<you@yourhost.com> SIZE=16
...
You wait 2 minutes for slowhost to look you up.
...
This situation can get worse, especially if the slow site runs slow
antispam software, because that site can take 9 or 10 minutes to
validate you. This can cause sendmail to seem to
hang for 9 or 10 minutes, then suddenly to continue with:
250 2.1.0 <you@yourhost.com>... Sender ok
>>> RCPT To:<bob@slowhost.com>
250 2.1.5 <bob@slowhost.com>... Recipient ok
>>> DATA
354 Enter mail, end with "." on a line by itself
>>> .
Furthermore, some mail transfer agents (MTAs) start to place a
message on disk only after all the data has been received, so writing
to an NFS-mounted disk can appear to hang for several seconds:
250 2.0.0 f0DHoNh91321 Message accepted for delivery
bob@slowhost.com... Sent (f0DHoNh91321 Message accepted for delivery)
Closing connection to mx.slowhost.com.
>>> QUIT
221 2.0.0 mx.slowhost.com closing connection
With all of this to contend with, a simple email message to a slow
host might not be delivered for many seconds, or even many minutes.
In actual practice, 99% of all hosts are very swift to accept mail.
But it takes only one message to a slow host to badly degrade your
overall delivery performance.
As distributed, the default timeouts for sending messages are
generous. So generous, in fact, that the following
defaults (as found in your .cf file) will never
prevent delivery to such slow hosts:
% grep Timeout /etc/mail/sendmail.cf
O ConnectionCacheTimeout=5m
#O Timeout.initial=5m
#O Timeout.connect=5m
#O Timeout.aconnect=0s
#O Timeout.iconnect=5m
#O Timeout.helo=5m
#O Timeout.mail=10m
#O Timeout.rcpt=1h
#O Timeout.datainit=5m
#O Timeout.datablock=1h
#O Timeout.datafinal=1h
#O Timeout.rset=5m
#O Timeout.quit=2m
#O Timeout.misc=2m
... etc.
The Timeout.mail=10m, for example, says that
sendmail will wait up to 10 minutes for the
receiving site to reply to its MAIL FROM: command.
During the actual nine minutes that it took in our example, that
particular queue-processing daemon did nothing else but wait for a
reply. If you deliver many messages to such a slow host, you might
find many queue-processing daemons blocked in parallel, waiting for
replies. If you were to do a process listing, you would find many
sendmail daemons in client
MAIL states. For example:
% /usr/ucb/ps axw | grep sendmail | grep -v grep
2600 ? IW 0:00 sendmail: f0DHoNh91321 slowhost.com [123.45.67.89] client MAIL
2608 ? IW 0:00 sendmail: f0DIofY02647 slowhost.com [123.45.67.89] client MAIL
2642 ? IW 0:00 sendmail: f0DIorg02649 slowhost.com [123.45.67.89] client MAIL
Here, three queue-processing daemons wait for a reply to the
MAIL FROM: command. None has gathered much time
(the :00) because all are spending most of their time blocked on
input.
Normally, slow hosts are not a problem. However, if your site needs
to send high volumes of email rapidly, such slow hosts can prove a
serious impediment to performance. Such high-volume sending sites can
include those that:
Handle delivery for many mailing lists
Deliver solicited advertising and announcements on behalf of
commercial customers
Send large numbers of notices that need to be delivered in a narrow
window of time
Function as an ISP in support of a huge number of outbound mailing
clients
6.2.1 Run Separate Fast and Slow sendmail Daemons
One way to handle slow hosts is to take advantage of
sendmail's tenacity in its
continual attempts to send email messages. When
sendmail cannot send a message, and when that
message times out during the sending process,
sendmail queues or re-queues that message so
that its delivery can be tried again later. One reason
sendmail sets such generous timeouts by default
is because it prefers to deliver all messages on the first try.
Real-world experience has consistently demonstrated that most
email is delivered by sendmail in less
than two seconds per message per recipient. You can demonstrate this
to yourself by looking at
sendmail's log files, and
examining the xdelay= equates (xdelay=). This tendency to deliver most email quickly
suggests employing a strategy that will allow fast messages to be
delivered by a "fast"
sendmail daemon, and slow messages to be handled
by separate "slow" queue
processors.
Consider configuring your main sendmail process
to be less tolerant of slow hosts by including the following lines in
your mc configuration file:
define(`TO', `2s')
define(`confTO_ICONNECT', `TO')
define(`confTO_CONNECT', `TO')
define(`confTO_COMMAND', `TO')
define(`confTO_DATAINIT', `TO')
define(`confTO_HELO', `TO')
define(`confTO_HOSTSTATUS', `TO')
define(`confTO_INITIAL', `TO')
define(`confTO_MAIL', `TO')
define(`confTO_QUIT', `TO')
define(`confTO_RCPT', `TO')
define(`confTO_RESOLVER_RETRANS', `TO')
define(`confTO_RESOLVER_RETRY', `TO')
define(`confTO_RSET', `TO')
define(`confTO_DATABLOCK', `1m')
define(`confTO_DATAFINAL', `1m')
The first line defines the m4 macro
TO with the value 2s, for two
seconds, the timeout used for all the critical outbound timeouts. A
macro is used so that you can easily modify this timeout based on
your actual needs. Note that the meaning of each timeout is explained
in Chapter 24.
To create a configuration file to be used by a queue-processing
daemon that runs often, add the preceding lines to a copy of your
normal mc file. Then use that copy to create a
cf file with a custom name, such as
/etc/mail/fast.cf.
To install a "fast"
queue-processing sendmail, edit whatever system
startup script starts sendmail on your machine.
It might, for example, be /etc/rc.local, or
/etc/init.d/sendmail, or
/etc/rc, or some other file based on your
operating system, and will likely contain an invocation such as this:
/usr/sbin/sendmail -bd -q30m
This line runs a listening daemon (the -bd, -bd) and a queue processor (the
-q30m, Section 11.8.1) all at
once.
Make a backup copy of your file, then change the earlier invocation
into a new two-line invocation, something such as this:
/usr/sbin/sendmail -L sendmail-fast -C /etc/mail/fast.cf -bd
/usr/sbin/sendmail -L sendmail-slow -C /etc/mail/slow.cf -q5m
These two lines replace the original one-line listening daemon and
queue-processor invocation. The first creates a listening daemon for
acceptance of inbound mail. The second creates a queue processor that
processes the queue once every five minutes. The
-L command-line switch (-L) defines how sendmail
will label itself in syslog records.
The first line uses the fast.cf configuration
file we created earlier that had short timeouts and is intolerant of
slow hosts. Any mail that cannot be sent on the first try will be
queued for a later try.
In the second line, the queue processor labeled
sendmail-slow picks up slow hosts once every five
minutes. Its configuration file is called
slow.cf and contains generous timeouts to ensure
that all queued mail will eventually be delivered.
To illustrate, consider a queued message destined for
bob@slowhost.com. First
sendmail-fast attempts to deliver the message.
You can simulate this yourself from the command line like this:
# /usr/sbin/sendmail -C /etc/mail/fast.cf -v -Rslowhost.com -q
Running /var/spool/mqueues/q.2/df/f0DHnvO02567 (sequence 1 of 1)
bob@slowhost.com... Connecting to mx.slowhost.com. via esmtp...
220 mx.slowhost.com ESMTP Sendmail 8.10.1/8.10.1; Fri, 13 Dec 2002 11:23:42 -0700
(MST)
>>> EHLO mx.slowhost.com
250-mx.slowhost.com Hello you@yourhost.com [123.45.678.9], pleased to meet you
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-SIZE
250-DSN
250-ONEX
250-ETRN
250-XUSR
250 HELP
>>> MAIL From:<you@yourhost.com> SIZE=16
again a wait of two minutes for slowhost to look you up, but this time the wait times
out after two seconds
The message fails to be sent (but does so swiftly, because of the
short timeouts), so sendmail-fast queues it for
a later delivery attempt.
Once every five minutes, the "slow"
queue-processing daemon will attempt to deliver the message. Again
you can simulate this for yourself from the command line like this:
% /usr/sbin/sendmail -C /etc/mail/slow.cf -v -Rslowhost.com -q
Running /var/spool/mqueues/q.2/df/f0DHnvO02567 (sequence 1 of 1)
bob@slowhost.com... Connecting to mx.slowhost.com. via esmtp...
220 mx.slowhost.com ESMTP Sendmail 8.10.1/8.10.1; Fri, 13 Dec 2002 11:35:10 - 0700
(MST)
>>> EHLO mx.slowhost.com
250-mx.slowhost.com Hello you@yourhost.com [123.45.678.9], pleased to meet you
250-ENHANCEDSTATUSCODES
250-8BITMIME
250-SIZE
250-DSN
250-ONEX
250-ETRN
250-XUSR
250 HELP
>>> MAIL From:<you@yourhost.com> SIZE=16
again a wait of two minutes
In this instance, you again wait two minutes for
slowhost to look up your site. Even if all the
waits combine to 15 minutes, the message will eventually be delivered
because the "slow" queue processor
has generous timeouts.
By combining short-timeout with normal-timeout queue processors, slow
hosts can be prevented from bogging down the normal outflow of email.
Note that the timeouts we show in this section are not intended to be
authoritative for all sites, and that we have simplified this example
for clarity. Many other settings, both inside and outside
sendmail, contribute to a successful outflow of
email. In addition to understanding the properties of timeouts (See this section), you should also apply the information in
Chapter 9, and combine it with an understanding of
the Timeout.resolver.retry option (See this section).
Beginning with V8.12 sendmail, you can use queue
groups (Section 11.4) to divide mail into separate
groups of queues. If you know beforehand, for example, that the
domain slowhost.com is always slow, you can use
queue groups to have all its mail queued onto inexpensive slow disks.
All undefined domains would then be queued onto expensive fast disks.
Queue groups, however, cannot be used to set different timeouts per
group. Instead, you must use separate configuration files as we have
illustrated.
6.2.2 Run a Fallback Host
Another alternative for handling slow email, if you can spare the
extra machine, is to set up a separate host with generous timeouts.
This "fallback" host is given all
mail that fails to be delivered on the first try by other hosts on
your network.
You cause failed messages to be sent to that machine by using the
FallbackMXhost option (FallbackMXhost) on your fast mail machine. In addition to the
short timeouts that we showed in the previous section, you could also
add the following declaration to the mc
configuration for your fast.cf file:
define(`confFALLBACK_MX', `IP-number')
define(`confFALLBACK_MX', `hostname')
You declare this option with either the IP number of the
fallback host or the hostname of the
fallback host.
This causes all failed mail to be forwarded to the
fallback host, which then attempts to deliver
all the problem messages that the fast hosts could not. Because most
email is fast, you can expect the fallback host
to handle only about 5% to 10% of your total mail volume. But,
because unexpected failures are a way of life with email, you should
also plan for the fallback host to get half or
more of your outbound email in a pinch, and size its disks
accordingly.
In theory you could extend this fallback host idea to a series of
fallback hosts, where each is given progressively longer timeouts. In
actual practice, however, a single fallback host tends to be
sufficient because email is generally very fast or very slow. There
is rarely any middle ground.
Instead of a series of hosts, consider using different timeouts for
initial and subsequent attempts. When a message is first forwarded to
a fallback host, the fallback host immediately tries to deliver it.
That first, immediate attempt is called the initial
attempt. If a message fails to be delivered on the
initial attempt, it remains queued on the fallback host for
subsequent attempts.
V8.8 and above sendmail allows you to set
different timeouts for the initial connection and for subsequent
connections. These are timeouts for establishing a TCP/IP or other
network connection. Here is a way to set up part of your
mc file on the fallback host:
define(`ITO', `20s') note initial timeout
define(`TO', `5m') note subsequent timeout
define(`confTO_ICONNECT', `ITO')
define(`confTO_CONNECT', `TO')
define(`confTO_COMMAND', `TO')
define(`confTO_DATAINIT', `TO')
define(`confTO_HELO', `TO')
define(`confTO_HOSTSTATUS', `TO')
define(`confTO_INITIAL', `TO')
define(`confTO_MAIL', `TO')
define(`confTO_QUIT', `TO')
define(`confTO_RCPT', `TO')
define(`confTO_RESOLVER_RETRANS', `TO')
define(`confTO_RESOLVER_RETRY', `TO')
define(`confTO_RSET', `TO')
define(`confTO_DATABLOCK', `1m')
define(`confTO_DATAFINAL', `1m')
The initial connection will timeout after 20 seconds. Thereafter, the
connection will timeout after five minutes. None of the other
timeouts shares this idea of initial versus subsequent timeouts. If
two sets of distinctly different timeouts are important to you, you
can employ that strategy by running two different daemons as shown in
the previous section, but this time running them on the fallback host
with much longer timeouts. One daemon would accept network
connections and have medium timeouts. A separate queue-processing
daemon (using a separate configuration file) would have longer
timeouts to ensure delivery of all remaining mail.
On the fallback host, note that the message failed twice before it
was turned over to the queue processing daemon. It failed once on the
fast server, and so was punted to the fallback host. It failed again
when it was immediately retried on the fallback host, and was then
left in the queue. Because failure is likely, the queue interval on
the queue-processing daemon on the fallback host should be long. We
suggest something in the range of one to several hours.
If you are running a very large site, you might need to run multiple
fallback hosts. To do this you need to run V8.12 or above
sendmail because only those versions look up MX
records (Section 9.3) for the
fallbackhost, and add those records to the list
of fallback MX host addresses. If the DNS zone file for these
fallback MX hosts lists MX records with equal costs, the additional
MX records will be added in random order. For example, one way to set
up part of such a zone file might look like this:
fallback1 IN A 123.45.67.81
fallback2 IN A 123.45.67.82
fallback3 IN A 123.45.67.83
fallback4 IN A 123.45.67.84
fallback5 IN A 123.45.67.85
fallback IN MX 10 fallback1
IN MX 10 fallback2
IN MX 10 fallback3
IN MX 10 fallback4
IN MX 10 fallback5
Here the costs are all equal (the 10s), so any of
the fallbacknumber hosts is
equally likely to receive a failed message.
Finally, consider using V8.12
sendmail's queue groups (Section 11.4) on the fallback host. With queue groups you
can dedicate a separate disk or disks to each of the several
well-known large ISPs. By running only a few queue processors in each
queue, there will be low impact while a large site is down, but
delivery will tend to be mildly parallel yet serialized and
reasonably fast when the large site comes back up.
|