D.2 Reject Bad Body Lines
One form of virus that spread rapidly during the writing of this book
looked, in part, like this:
--K342Sj044MoQ0E0dh90A9n2Md066lL7
Content-Type: audio/x-wav;
name=na tla.exe
Content-Transfer-Encoding: base64
Content-ID: <GxPtp514A04SX3089G>
TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAA2AAAAA4fug4AtAnNIbgBTM0hVGhpcyBwcm9ncmFtIGNhbm5vdCBiZSBydW4gaW4g
RE9TIG1vZGUuDQ0KJAAAAAAAAAAYmX3gXPgTs1z4E7Nc+BOzJ+Qfs1j4E7Pf5B2zT/gTs7Tn
GbNm+BOzPucAs1X4E7Nc+BKzJfgTs7TnGLNO+BOz5P4Vs134E7NSaWNoXPgTswAAAAAAAAAA
etc. for many lines
This message body could be easily screened and rejected using the
MILTER interface (Section 7.6) supplied with
sendmail. Some sites, however, do not run
versions of Unix that support POSIX threads
(pthreads). At such sites, the MILTER interface
is not available, so instead such screening must be done inside the
checkcompat( ) routine.
The method we chose to illustrate here is based on the idea that
parts of a message are separated from the headers, and from each
other, by one or more blank likes:
Content-ID: <GxPtp514A04SX3089G>
a blank line
TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
By looking at just the first line of each part, we should be able to
determine if the message should be rejected. To perform this
examination, we decided to arbitrarily limit the length of the line
we examine to the first 15 characters.
TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
to here
Because we are going to screen those 15 characters with a rule set,
we decided to replace any space characters found among those first 15
characters with underbar characters:
"Free sex. Yes XXX!" becomes "Free_sex._Yes_X"
Our rule set, which we will call checkcompat_c,
should be designed to look like this:
LOCAL_RULESETS
Scheckcompat_c
R $* $: $(access Body:$1 $)
R REJECT $#error $@ 5.7.1 $: "554 Body of message cannot be accepted."
R $* $: OK
The idea here is to use the access database
(Section 7.5) to see if the line found in the
message should be rejected. The first rule in this example, prefixes
that line (the workspace passed to checkcompat_c,
the $1 in the righthand side, or RHS) with a
literal Body: just prior to the lookup. If the
access database finds those first 15 characters,
it returns a literal REJECT, causing the message to be rejected.
An example of some possible access database
entries might look like this:
Body:TVqQAAMAAAAEAAA REJECT
Body:Absolutely_Free REJECT
In the following, we show you the C-language code that does all this.
The numbers to the left are for reference only and are not part of
the code.
1 int
2 checkcompat(to, e)
3 register ADDRESS *to;
4 register ENVELOPE *e;
5 {
6 char buf[BUFSIZ];
7 bool foundblankline;
8 int olderrs;;
9 char *cp;
10
11 if (tTd(49, 1))
12 sm_dprintf("checkcompat(to=%s, from=%s)\n",
13 to->q_paddr, e->e_from.q_paddr);
14
15 if (!bitnset(M_LOCALMAILER, to->q_mailer->m_flags))
16 return EX_OK;
17
18 if (e->e_dfp = = NULL)
19 {
20 if (bitset(EF_HAS_DF, e->e_flags))
21 {
22 /*
23 ** Open the message body (df file) to read
24 */
25 char *df = queuename(e, DATAFL_LETTER);
26
27 e->e_dfp = sm_io_open(SmFtStdio, SM_TIME_DEFAULT, df,
28 SM_IO_RDONLY, NULL);
29 if (e->e_dfp = = NULL)
30 {
31 /* can't open df file, so tempfail it */
32 return EX_TEMPFAIL;
33 }
34 }
35 else
36 return EX_OK;
37 }
38
39 (void) bfrewind(e->e_dfp);
40 foundblankline = false;
41 Errors = 0;
42
43 /*
44 ** Scan all the lines in the file, looking for a virus
45 ** identifier line following a blank line.
46 */
47 while (sm_io_fgets(e->e_dfp, SM_TIME_DEFAULT, buf, sizeof(buf)) != NULL)
48 {
49 if ((cp = strpbrk(buf, "\r\n")) != NULL)
50 *cp = '';
51
52 if (buf[0] = = '')
53 {
54 /* found a blank line */
55 foundblankline = true;
56 continue;
57 }
58 if (!foundblankline)
59 continue;
60 foundblankline = false;
61
62 if (strlen(buf) > 15)
63 buf[15] = '';
64
65 for (cp = buf; *cp != ''; ++cp)
66 {
67 if (isascii(*cp) && isspace(*cp))
68 *cp = '_';
69 }
70
71 if (tTd(49, 1))
72 sm_dprintf("checkcompat: check "%s"\n", buf);
73
74 olderrs = Errors;
75 if (rscheck("checkcompat_c", buf, NULL, e, RSF_RMCOMM|RSF_COUNT,
76 3, NULL, e->e_id) != EX_OK || Errors > olderrs)
77 {
78 e->e_flags |= EF_NO_BODY_RETN;
79 to->q_status = "5.7.1";
80 return EX_UNAVAILABLE;
81 }
82 }
83
84 if (sm_io_error(e->e_dfp))
85 {
86 syserr("checkcompat: %s/%cf%s: read error",
87 qid_printqueue(e->e_dfqgrp, e->e_dfqdir),
88 DATAFL_LETTER, e->e_id);
89 return EX_IOERR;
90 }
91 return EX_OK;
92 }
Although this routine is long, it is actually fairly simple. We begin
at line 2, which shows the
checkcompat( ) routine declared just the same as
it is in conf.c. We already explained that the
two arguments passed to checkcompat( ) are
pointers to the to and e
structures. The local variables are declared next (line 6), and we need only four to perform our check:
char buf[BUFSIZ];
bool foundblankline;
int olderrs;
char *cp;
The buf is the buffer into which we will read each
line of the datafile for checking. The
foundblankline is a semaphore. Because we will
check only the first line following a blank line we need to keep
track of whether a preceding blank line was found. The
*cp is a pointer used to find spaces inside each
truncated line. And the olderrs is used to store
the current value of the Errors global variable
before calling the rule set.
Before we check lines, we first need to make sure the message is
being delivered locally by checking (line 15) to
see if the M_LOCALMAILER flag (the F=l delivery
agent flag, F=l (lowercase L)) was set, and returning
EX_OK if it was not. Note that we don't want to
screen outbound or relayed mail.
Next, we open the datafile (the df file) for
reading, assuming it is not already open (line 18) and that it actually exists (line 20). We construct the file's
name using the queuename( ) routine (line 25), and open the file using the
sm_io_open( ) routine (line 27). We don't call
fopen(3) ourselves because
sendmail might have the file open in RAM and not
on disk.
We rewind the file to its beginning (line 39),
in case it was already open. We also preset our boolean
foundblankline to false before
entering our main loop.
In a loop, we call sm_io_fgets( ) to read each
line of the message (line 47). We
don't use fgets(3) because
sendmail might have the message in RAM.
The first thing we do inside the loop is look for a blank line. If
the current line is such a line we set
foundblankline to true (line 55). If we have not found a blank line, we
continue looking for one.
Once a line following a blank line has been found, we check it.
First, we truncate it to 15 characters (line 62). Then we replace any space characters found
in it with underbar characters (line 65).
Lastly, we pass that line to the checkcompat_c
rule set (line 75), and if that rule set
rejects the message, we do so on the next three
lines:
78 e->e_flags |= EF_NO_BODY_RETN;
79 to->q_status = "5.7.1";
80 return EX_UNAVAILABLE;
First, we set the EF_NO_BODY_RETN flag to prevent the body from being
included in the bounce. Then we set the DSN status to 5.7.1, and
finally return EX_UNAVAILABLE to cause
sendmail to bounce the message.
If no offending lines were found in the message, we check for I/O
errors (line 84), and if any were found, we
report them and return EX_IOERR. Otherwise, we return EX_OK to tell
sendmail the message is OK to deliver.
Because every line of every message will be read by this routine, you
should not use it on a site that handles large amounts of email.
Instead, you should upgrade to an operating system that supports
pthreads so that you can use the MILTER
interface.
You should also avoid the temptation to pass the start of every line
to the checkcompat_c rule set. Such overuse of a
rule set can slow sendmail too much on even a
lightly used site, thus increasing the load on your machine. Instead,
try to be clever in what you search for to minimize the impact of
this routine.
|