D.2 Reject Bad Body Lines

One form of virus that spread rapidly during the writing of this book looked, in part, like this:

--K342Sj044MoQ0E0dh90A9n2Md066lL7
Content-Type: audio/x-wav;
        name=na tla.exe
Content-Transfer-Encoding: base64
Content-ID: <GxPtp514A04SX3089G>

TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAA2AAAAA4fug4AtAnNIbgBTM0hVGhpcyBwcm9ncmFtIGNhbm5vdCBiZSBydW4gaW4g
RE9TIG1vZGUuDQ0KJAAAAAAAAAAYmX3gXPgTs1z4E7Nc+BOzJ+Qfs1j4E7Pf5B2zT/gTs7Tn
GbNm+BOzPucAs1X4E7Nc+BKzJfgTs7TnGLNO+BOz5P4Vs134E7NSaWNoXPgTswAAAAAAAAAA
 etc. for many lines

This message body could be easily screened and rejected using the MILTER interface (Section 7.6) supplied with sendmail. Some sites, however, do not run versions of Unix that support POSIX threads (pthreads). At such sites, the MILTER interface is not available, so instead such screening must be done inside the checkcompat( ) routine.

The method we chose to illustrate here is based on the idea that parts of a message are separated from the headers, and from each other, by one or more blank likes:

Content-ID: <GxPtp514A04SX3089G>
                                   a blank line
TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

By looking at just the first line of each part, we should be able to determine if the message should be rejected. To perform this examination, we decided to arbitrarily limit the length of the line we examine to the first 15 characters.^[B]

^[B] A 15-character limit was chosen to keep example program listing simple. Code to allow for variable lengths would, of course, be better, and would reduce the chance for false positives. We leave such code improvements up to you.

TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
               
            
           to here

Because we are going to screen those 15 characters with a rule set, we decided to replace any space characters found among those first 15 characters with underbar characters:

"Free sex. Yes XXX!"    becomes   "Free_sex._Yes_X"

Our rule set, which we will call checkcompat_c, should be designed to look like this:

LOCAL_RULESETS
Scheckcompat_c
R $*            $: $(access Body:$1 $) 
R REJECT        $#error $@ 5.7.1 $: "554 Body of message cannot be accepted."
R $*            $: OK

The idea here is to use the access database (Section 7.5) to see if the line found in the message should be rejected. The first rule in this example, prefixes that line (the workspace passed to checkcompat_c, the $1 in the righthand side, or RHS) with a literal Body: just prior to the lookup. If the access database finds those first 15 characters, it returns a literal REJECT, causing the message to be rejected.

An example of some possible access database entries might look like this:

Body:TVqQAAMAAAAEAAA    REJECT
Body:Absolutely_Free    REJECT

In the following, we show you the C-language code that does all this. The numbers to the left are for reference only and are not part of the code.

1  int
2  checkcompat(to, e)
3           register ADDRESS *to;
4           register ENVELOPE *e;
5   {
6          char buf[BUFSIZ];
7           bool foundblankline;
8           int olderrs;;
9           char *cp;
10  
11          if (tTd(49, 1))
12                  sm_dprintf("checkcompat(to=%s, from=%s)\n",
13                              to->q_paddr, e->e_from.q_paddr);
14  
15         if (!bitnset(M_LOCALMAILER, to->q_mailer->m_flags))
16                  return EX_OK;
17  
18         if (e->e_dfp =  = NULL)
19          {
20                 if (bitset(EF_HAS_DF, e->e_flags))
21                 {
22                         /*
23                         **  Open the message body (df file) to read
24                         */
25                         char *df = queuename(e, DATAFL_LETTER);
26 
27                         e->e_dfp = sm_io_open(SmFtStdio, SM_TIME_DEFAULT, df,
28                                               SM_IO_RDONLY, NULL);
29                         if (e->e_dfp =  = NULL)
30                        {
31                                 /* can't open df file, so tempfail it */
32                                 return EX_TEMPFAIL;
33                         }
34                 }
35                 else
36                         return EX_OK;
37         }
38 
39         (void) bfrewind(e->e_dfp);
40         foundblankline = false;
41         Errors = 0;
42 
43         /*
44         **  Scan all the lines in the file, looking for a virus
45         **  identifier line following a blank line.
46         */
47         while (sm_io_fgets(e->e_dfp, SM_TIME_DEFAULT, buf, sizeof(buf)) != NULL)
48         {
49                 if ((cp = strpbrk(buf, "\r\n")) != NULL)
50                         *cp = '';
51 
52                 if (buf[0] =  = '')
53                 {
54                         /* found a blank line */
55                         foundblankline = true;
56                         continue;
57                 }
58                 if (!foundblankline)
59                         continue;
60                 foundblankline = false;
61 
62                 if (strlen(buf) > 15)
63                         buf[15] = '';
64 
65                 for (cp = buf; *cp != ''; ++cp)
66                 {
67                         if (isascii(*cp) && isspace(*cp))
68                                 *cp = '_';
69                 }
70 
71                 if (tTd(49, 1))
72                         sm_dprintf("checkcompat: check "%s"\n", buf);
73 
74                 olderrs = Errors;
75                 if (rscheck("checkcompat_c", buf, NULL, e, RSF_RMCOMM|RSF_COUNT,
76                              3, NULL, e->e_id) != EX_OK || Errors > olderrs)
77                 {
78                         e->e_flags |= EF_NO_BODY_RETN;
79                         to->q_status = "5.7.1";
80                         return EX_UNAVAILABLE;
81                 }
82         }
83 
84         if (sm_io_error(e->e_dfp))
85         {
86                 syserr("checkcompat: %s/%cf%s: read error",
87                         qid_printqueue(e->e_dfqgrp, e->e_dfqdir),
88                         DATAFL_LETTER, e->e_id);
89                 return EX_IOERR;
90         }
91         return EX_OK;
92 }

Although this routine is long, it is actually fairly simple. We begin at line 2, which shows the checkcompat( ) routine declared just the same as it is in conf.c. We already explained that the two arguments passed to checkcompat( ) are pointers to the to and e structures. The local variables are declared next (line 6), and we need only four to perform our check:

char buf[BUFSIZ];
bool foundblankline;
int olderrs;
char *cp;

The buf is the buffer into which we will read each line of the datafile for checking. The foundblankline is a semaphore. Because we will check only the first line following a blank line we need to keep track of whether a preceding blank line was found. The *cp is a pointer used to find spaces inside each truncated line. And the olderrs is used to store the current value of the Errors global variable before calling the rule set.

Before we check lines, we first need to make sure the message is being delivered locally by checking (line 15) to see if the M_LOCALMAILER flag (the F=l delivery agent flag, F=l (lowercase L)) was set, and returning EX_OK if it was not. Note that we don't want to screen outbound or relayed mail.

Next, we open the datafile (the df file) for reading, assuming it is not already open (line 18) and that it actually exists (line 20). We construct the file's name using the queuename( ) routine (line 25), and open the file using the sm_io_open( ) routine (line 27). We don't call fopen(3) ourselves because sendmail might have the file open in RAM and not on disk.

We rewind the file to its beginning (line 39), in case it was already open. We also preset our boolean foundblankline to false before entering our main loop.

In a loop, we call sm_io_fgets( ) to read each line of the message (line 47). We don't use fgets(3) because sendmail might have the message in RAM.

The first thing we do inside the loop is look for a blank line. If the current line is such a line we set foundblankline to true (line 55). If we have not found a blank line, we continue looking for one.

Once a line following a blank line has been found, we check it. First, we truncate it to 15 characters (line 62). Then we replace any space characters found in it with underbar characters (line 65).

Lastly, we pass that line to the checkcompat_c rule set (line 75),^[C] and if that rule set rejects the message,^[D] we do so on the next three lines:

^[C] Note that the syntax of the rscheck( ) subroutine changed between V8.12.5 and V8.12.6.

^[D] Note that we reject if the result is not EX_OK. For completeness, you should also accept a return of EX_TEMPFAIL.

78                        e->e_flags |= EF_NO_BODY_RETN;
79                        to->q_status = "5.7.1";
80                        return EX_UNAVAILABLE;

First, we set the EF_NO_BODY_RETN flag to prevent the body from being included in the bounce. Then we set the DSN status to 5.7.1, and finally return EX_UNAVAILABLE to cause sendmail to bounce the message.

If no offending lines were found in the message, we check for I/O errors (line 84), and if any were found, we report them and return EX_IOERR. Otherwise, we return EX_OK to tell sendmail the message is OK to deliver.

Because every line of every message will be read by this routine, you should not use it on a site that handles large amounts of email. Instead, you should upgrade to an operating system that supports pthreads so that you can use the MILTER interface.

You should also avoid the temptation to pass the start of every line to the checkcompat_c rule set. Such overuse of a rule set can slow sendmail too much on even a lightly used site, thus increasing the load on your machine. Instead, try to be clever in what you search for to minimize the impact of this routine.