6.6 Condition the Network

Sending messages over the Internet is a surprisingly complex undertaking. In addition to the simple SMTP transaction, hostnames need to be looked up with DNS, your address needs to be looked up by the receiving site, and your site can be probed with ident. In this section, we explore ways that you can improve performance over the network.

Avoid spam behavior. If you try to send too many messages to a single site too quickly, that site can artificially slow you down.
Use a well-exercised name server. MX records timeout and need to be looked up periodically. A busy name server will keep them fresh, whereas a special bulk-email sending name server will find that many records have timed out when it is asked to run.
Reverse lookups of your address must be swift. Every time you connect to another site to send email, that site looks up your address before accepting email.
Don't let the identd(8) port hang. Many sites will use the ident protocol to see who at your site is sending the email. If you want to prevent that lookup, do so by rejecting the connection, instead of letting the connection hang.
You are limited by the smallest piece of the pipe. Installing a gigabit network in-house will not speed outbound email if you are connected to the Internet with DSL.

6.6.1 Don't Appear to Spam

The sendmail program is written to send email in a fast but well-behaved manner. For example, sendmail will try, whenever possible, to send all messages to a single site sequentially, reusing a single connection. This is made possible by defining the ConnectionCacheSize option (ConnectionCacheSize) with a suitable value.

When sending huge amounts of email, the temptation exists to wrongly run many sendmail daemons in parallel. A natural presumption is that greater parallelism generates greater throughput. To a modest extent, this presumption is correct, but when taken to excess, you run the risk of appearing to spam.

To illustrate, consider the following /etc/init.d script's abstract:

/usr/sbin/sendmail -O QueueDir=/queues/q.* -odq -bd

The intention here is to queue all outbound email (the -odq, See this section) and then to run a separate sendmail daemon in each queue directory so that the mail in all directories can go out with maximum parallelism.

A flawed (we discuss why next) shell script, that looks something like this, is run once the queues are full:

#!/bin/sh
/usr/sbin/sendmail -O QueueDir=/queues/q.1 -q &
/usr/sbin/sendmail -O QueueDir=/queues/q.2 -q &
/usr/sbin/sendmail -O QueueDir=/queues/q.3 -q &
... etc.

There is little risk running a script such as this when the number of queues is small, but when the number of queues rises into the dozens or hundreds, several forms of spamming behavior begin to appear.

First there is risk of making too many connections to a single host in too short an interval. Next, even if the connections are not made too rapidly, the total number of parallel connections might appear excessive. Some major mail-hosting sites will reject additional connections from a sending site when a certain number of parallel connections have been established.

Finally, and even worse, directing hundreds of simultaneous connections to a receiving site can cause your site to be labeled by them as a spamming engine and barred from access.

The correct way to achieve parallelism is to leave that parallelism up to sendmail. In the earlier scheme of filling the queues and then sending the email, the sending script should look like this:

#!/bin/sh
/usr/sbin/sendmail -O QueueDir=/queues/q.* -q

Note that the QueueDir option should normally be specified in your configuration file and that we use the command line here only as an example. Also note that the -q will cause the queues to be processed once and only once. Any messages that cannot be delivered will be left in the queue. This might be intentional, for example, at sites that want to clear the queue between sends or move messages that could not be sent to a secondary queue for later delivery.

Beginning with V8.12 sendmail, queue groups (Section 11.4) allow you to establish persistent queue processors that wake up periodically to process a group. The number of parallel processes for each group can be tuned to your needs with different wakeup intervals. Even with queue groups, however, beware of too much parallelism.

6.6.2 Use an Active Name Server

Name servers are generally configured to cache the information they find when looking up a host's name. Associated with each item of information is a time-to-live, called a TTL. The TTL determines how long an item will remain in the cache until it expires and the information needs to be looked up again.

To see this behavior, run the following command:

% dig mx sendmail.org

These four lines are part of the screenful of output produced by dig(1):

;; ANSWERS:
sendmail.org.   912     MX      10 smtp.neophilic.com.
sendmail.org.   912     MX      20 smtp.gshapiro.net.
sendmail.org.   912     MX      100 playground.sun.com.

The ANSWERS section lists three MX records for the site sendmail.org. The 10, 20, and 100 are the costs of sending to each MX site (Section 9.2.5). The sendmail program sends to the lowest-cost site first, then to the next higher-cost site, and so on.

The 912 in each line is the TTL for each record, the number of seconds (or 15.2 minutes) that these records should remain in the name server's cache before they are discarded. Clearly, if you support mailing lists that are sent out hourly, your site will have to look up sendmail.org's MX records afresh each time your lists are emailed.

The extra time needed to look up sendmail.org is very tiny, but if you are looking up thousands of sites each time your lists are processed, that tiny amount of extra time can grow to minutes and has the potential to seriously impact performance. If your site sends mailing-list email hourly or daily, you will probably be better off using a name server that is constantly being refreshed, rather than one that is dedicated to sending email.

To test this, make up a file that contains a list of a few hundred domains. Such a file might begin with these few lines:

sendmail.org
aol.com
irs.gov
... etc.

Then, set up the following shell script and call it mx.sh:

#!/bin/sh
SERVER=your name server here
LIST_FILE=list
for host in `cat $LIST_FILE`
do
       dig mx $host @$SERVER | grep -v '^;' | grep -w MX
done

Then, run the following command which uses that list, reading from the file named list:

% sh mx.sh > /dev/null

Now, run this command again. It should run more quickly this time because the first time it timed out MX records that had to be refreshed. Whereas on this second run, the MX records are already in your name server's cache.

Change the name following the SERVER= to that of a well-exercised name server—for example, the name server provided by your ISP. You may find that name server running faster than your own name server.

Run the preceding command once without the > /dev/null to gather a list of the results. Prune from that list any MX records with timeouts less than your sending interval. You might be surprised to find that many large sites timeout their MX records in minutes or seconds. A run on the earlier list with its three entries might look like this with the latest version of dig(1):

sendmail.org.           6H IN MX        100 playground.sun.com.
sendmail.org.           6H IN MX        10 smtp.neophilic.com.
sendmail.org.           6H IN MX        20 smtp.gshapiro.net.
aol.com.                1H IN MX        15 mailin-03.mx.aol.com.
aol.com.                1H IN MX        15 mailin-04.mx.aol.com.
aol.com.                1H IN MX        15 mailin-01.mx.aol.com.
aol.com.                1H IN MX        15 mailin-02.mx.aol.com.
irs.gov.                10M IN MX       5 MX-RELAY2.treas.gov.
irs.gov.                10M IN MX       5 MX-RELAY1.treas.gov.

Older versions of dig(1) display the timeout interval in seconds, so 10 minutes (the 10M) in this example will display as 600.

Note that aol.com times out its MX records in an hour. If your mailing list goes out more often than hourly, you should exclude aol.com from your list file. Similarly, irs.gov times out its MX records in 10 minutes. As a consequence, those records will almost always need to be refreshed, so irs.gov should be removed from your list file, too.

In general, unless there is a compelling reason to do so, you should always give your own MX records a long TTL. Six hours, as in the sendmail.org lines shown earlier, is a good interval, as would be any longer interval.

6.6.3 Make Reverse Lookups Swift

When your site's sendmail connects to another site to send email, that other site only knows your IP number.^[12] Because the other site wants to know your hostname, it performs a reverse lookup on that number, and turns it into a canonical hostname. You can try this yourself with the dig(1) program. Just run it like we show here, but substitute the IP number shown with your own:

^[12] This is before your site sends the HELO or EHLO greeting with your host's name.

% dig -x 123.45.67.89

Because your IP number is looked up each and every time you send an email message, you need to be sure that the lookup is very swift. One way to do that is to ask someone at your site to remotely log into a different site where that person has an account. Then ask that person to run the same dig(1) as you did, and to time the command to see how swift or slow it really is.

Address records have the same TTL fields that hostnames do. Look at the TTL displayed when you looked up your site's IP number. If your IP number rarely changes, as is usually the case, you should consider giving that record a long TTL, perhaps a week.^[13] This will ensure that reverse lookups of your IP number will be answered from name server caches, instead of needing to be looked up afresh over the Internet.

^[13] When you need to change IP numbers, as when you change from one ISP to another, decrease the TTL to an hour or so well before the move. If your TTL is, say, a week, perform this change more than a week ahead so that the shorter TTL will be in effect at the time of the move. After the move, you can increase the TTL again.

6.6.4 Don't Let identd Hang

Many sites will try to use the ident protocol ($_) to look up the user and host at your site that made the outgoing email connection to their site. In fact, the default setup for sendmail is to do that lookup whenever someone connects to your site.

For reasons of privacy, many sites are turning off identd(8).^[14] Yours might be among such sites. There are several effective ways to turn off an identd(8) at your site:

^[14] If you run identd(8) at your site, you should consider upgrading to pidentd(8). It is faster and more secure than the identd(8) normally supplied with operating systems.

Comment it out of inetd.conf and restart the inetd daemon.
Configure your firewall to reject connections to port 113.
Configure your router to reject connections to port 113.

All of these have the benefit of causing the connection to be rejected immediately. This allows the other site to continue without having to wait for information or for the connection to timeout.

Another way to turn off identd(8) is to configure your router or firewall to drop packets destined for port 113, but this can be dangerous. When you drop packets, the other side will wait for a reply. When the reply does not come, it will retransmit the packet. This can delay the realization that a reply will not come for as long as 90 seconds. If every host you try to send email to, has to wait 90 seconds or more to find out that you don't want to send ident information, it will impose a huge slowdown on how fast you can send email.