6.1 Handle Deep Queues

To understand the potential problems with deep queues, first consider how sendmail processes a single queue when its QueueSortOrder option (QueueSortOrder) is set to the default of priority.^[2] When sendmail is instructed to process a queue it opens the queue directory for reading and reads that directory to gather a list of qf files to process. Each qf file sendmail finds is opened for reading and scanned for important pieces of information. The N line in each qf file, for example, holds the number of times the message has been tried. The P line holds each message's current priority.

^[2] The degenerate case of multiple queues is a single queue. We examine a single queue here, for simplicity sake.

After all messages have been opened, read, and closed, and after the information from each has been saved internally, sendmail sorts that information. The purpose of the sort is to ensure that new mail is tried before old, and that high-priority mail is tried before low-priority mail.

Under normal circumstances this process occurs quickly. But when queues get abnormally deep, things can go wrong in unusual ways. In the following, we show one way that sendmail could be run on a major mail-sending machine:

/usr/sbin/sendmail -bd
/usr/sbin/sendmail -q10m

The idea here is to create two mail-handling daemons. The first handles inbound mail and, because this is a mail-sending machine, we expect that this inbound daemon will perform little work. The second daemon sends all mail it finds in its queue. It will fork(2) a copy of itself once every 10 minutes, and that copy will process all the messages in the queue. As described earlier, each queued message is opened and read so that all the messages can be sorted before delivery begins.

Because this hypothetical site is a major mail-sending site, we expect a high rate for the number of sent messages. For the sake of argument, let's say 30,000 messages need to be sent per hour.

Now suppose that one day a backhoe, a power failure, clumsy fingers, or any of a thousand possible events causes this site's only connection to the Internet to fail for an hour. The site can neither look up host information with DNS, nor can it connect to any remote sites. All the mail it tries to send that hour fails, and instead of being removed from the queue, this mail is left there so that it can be tried again later (presumably after the problem is fixed).

An hour later, service is restored. First, the default:

/usr/sbin/sendmail -q10m

causes a forked copy of sendmail to start processing the queue. This time, however, the processing is not swift. When a queue fills to 30,000 or more messages, the amount of time it takes to preread the queue (to open and read every message), increases to more than 20 minutes.^[3] And that 20 minutes is only for the preread. During that 20 minutes no mail will be sent.

^[3] Again for simplicity we assume a standard hard disk. Naturally reads will be much faster on specialty hardware such as memory-based disks.

After that, things get worse. Ten minutes later a second sendmail daemon is forked, and it, too, starts to preread the queue. Now, instead of one sendmail daemon opening and reading all messages in a queue, we have two sendmail daemons doing the same thing in parallel.

Contrary to what you might think, twice as much I/O on a disk is not twice as fast. Disks are finite devices that perform a limited number of disk-head moves^[4] per second and can transmit only a fixed number of bytes per second. Because the two sendmail daemons are 10 minutes out of step with each other, each is reading and processing separate files. Depending on the size of your in-memory disk cache, neither will likely be able to take advantage of the efficiencies of such caching. In short, two sendmail daemons processing a deep queue in parallel is worse than a single sendmail daemon processing that same queue alone.

^[4] Operations that cause the disk head to move, such as file unlinks, are called IOPs. Typical hard disks are limited to about 120 IOPs per second. When sendmail successfully delivers a message it can consume from 10 to 13 IOPs per message.

And if that weren't enough, another 10 minutes later a third sendmail daemon starts to process the queue.

By now the first sendmail daemon might have finished its preread of the queue and might have actually begun to send messages. But even if it has, now three sendmail daemons are processing that single deep queue and a curious thing starts to happen. Because the disk that holds the queue is finite, the addition of a third sendmail daemon slows the operation of the first two. The second one, instead of taking 20 minutes to preread the queue, will now take 30 minutes.

This means that every 10 minutes another sendmail queue-processing daemon is added to the mix. As each is added, each slows all the others that are already running, and it isn't long before the load on the machine starts to climb and the rate at which messages are delivered falls at an alarming rate. In fact, when this sort of behavior hits a very large-volume site, a sendmail queue-processing daemon can start and seem to never finish.

Depending on the speed of your disk system, even limiting the number of queue processors per queue might not save you from this sluggish performance. Under V8.12 sendmail, for example, you can limit the number of queue runners per queue with a queue group (Section 11.4) definition such as this in your mc configuration file:

QUEUE_GROUP(`fastq', `P=/q/fastq*, I=10m, R=10')

Here, the fastq group uses the queue disks mounted as /q/fastq*, processes those disks once per 10 minutes (the I=10m), and limits itself to 10 queue runners maximum (the R=10) across all the disks. If there are few fastq* queue disks, and if they fill to more than 30,000 messages each, they too can become sluggish, even with only 10 runners processing them. In fact, with sufficient filled queue depth, as few as two simultaneous queue runners can seriously affect performance.

In extreme situations such as this, one alternative is to use persistent queue runners (Section 11.8.3). With persistent queue runners, you maintain a single queue runner that alone reads the queue. After that single queue runner has read the queue, it forks multiple child queue runners to process the queue, with each child sharing the parent's queue information:

/usr/sbin/sendmail -qp10m

Here, the -qp causes one or more persistent queue runners to launch. One is launched for each queue group, and will persist to run, sleeping 10 minutes between each reading of the queue. When it awakes, it gathers a list of queue files and launches multiple child processes to handle that list. After the last child has finished delivery and exited, the parent sleeps again.

Even with queue groups and persistent queue runners, you are encouraged to spread queues across many directories and across many disks and controllers. This increases parallelism and dramatically lessens the likelihood that any given queue will overfill.

6.1.1 Recover from a Full Queue

When a queue directory is exceptionally full, you will likely notice the problem only when performance on your queue-handling machine becomes unusually sluggish. By that time, however, a drastic measure, such as rebooting the server, might be the only cure. Clearly, early detection is desirable.

Early signs that a queue is filling can be seen in the logging messages that sendmail produces. You can develop scripts that watch for lines such as these:

Dec 13 10:27:53 your.domain sendmail[642]: grew WorkList for /var/spool/mqueue to 
2000
Dec 13 10:29:05 your.domain sendmail[642]: grew WorkList for /var/spool/mqueue to 
3000
Dec 13 10:34:31 your.domain sendmail[642]: grew WorkList for /var/spool/mqueue to 
4000
... etc., to:
Dec 13 12:40:22 your.domain sendmail[642]: grew WorkList for /var/spool/mqueue to 
29000
Dec 13 12:42:50 your.domain sendmail[642]: grew WorkList for /var/spool/mqueue to 
30000

Here, the WorkList refers to the number of messages preread so far. By searching for unusual sizes, you can determine when a queue is about to overfill.

Another technique is to run the mailq command (Section 15.1.2) to observe the total number of messages queued across all queues:

% mailq -OMaxQueueRunSize=0 | tail -1           V8.7 through V8.11 
                Total Requests: 34190

% mailq -bP                                     V8.12 and above 
/var/spool/mqueues/q.1/df: entries=34190
                Total requests: 34190

For V8.7 through V8.11, the MaxQueueRunSize=0 allows mailq to run swiftly, regardless of how deep the queue or queues might be. Without that option, and with deep queues, mailq would be just as slow as the sluggish queue runs, but beginning with V8.12, the -bP command-line switch does the same thing more quickly.

No matter how you detect the problem, the solution will be the same. First, you need to kill all the competing sendmail queue-processing daemons. There are a wide number of ways to do this. The most common is to use ps(1) to gather PID numbers and then kill each queue-processing daemon individually. No matter how you kill the queue-processing daemons, be sure to kill them all. If you don't, you might find the problem surfacing again before you have had a chance to fix the problem.

The best way to flush a full queue is with a command line something like this:

# /usr/sbin/sendmail -OQueueSortOrder=filename -q10m -d99.100 
# /usr/sbin/sendmail -OQueueSortOrder=random   -q10m -d99.100       V8.12 and above

Here, the -d99.100 tells sendmail to run in the foreground (so that you can kill it easily when done). The -q10m causes a queue-processing daemon to be launched once each 10 minutes (just like before). You need this because one daemon can seem to hang when delivering mail to a slow host. By running parallel daemons, you avoid this pitfall.

Sorting by filename or random (Section 11.7) causes sendmail to skip the opening and reading of each queued message. Instead, it only looks at the filename for its sorting or randomizing order. On the downside, this prevents sendmail from grouping messages for optimum delivery. On the upside, this reduces the time to preread a huge queue from 20 or so minutes to less than 2 seconds.^[5]

^[5] As measured on a 300MHz Intel machine running BSDI Unix version 3.

The QueueSortOrder=random (See this section) is just like the QueueSortOrder=filename shown earlier, except that it randomizes the list before beginning delivery. This method is preferred, but is only available beginning with V8.12 sendmail.

After draining the full queue to a more manageable level, you can discontinue this special process and rerun sendmail in its normal manner.

If the full queue has to remain in service while the full state is being solved, you can use the techniques in Section 11.9.1 to move that full queue out of the way so that it can be processed in the background.