Published on Perl.com http://www.perl.com/pub/a/2002/03/06/spam.html See this if you're having trouble printing code examples Stopping Spam with SpamAssassin By Simon Cozens I receive a lot of spam; an absolute massive bucket load of spam. I received more than 100 pieces of spam in the first three days of this month. I receive so much spam that Hormel Foods sends trucks to take it away. And I'm convinced that things are getting worse. We're all being bombarded with junk mail more than ever these days. Well, a couple of days ago, I reached my breaking point, and decided that the simple mail filtering I had in place up until now just wasn't up to the job. It was time to call in an assassin. SpamAssassin
SpamAssassin is a rule-based
spam identification tool. It's written in Perl, and there are several
ways of using it: You can call a client program,
SpamAssassin is extremely configurable; you can select which rules you want to use, change the way the rules contribute to a piece of mail's "spam score," and add your own rules. We'll look at some of these features later in the article. First, how do we get SpamAssassin installed and start using it?
If you're using Debian Linux or one of the BSDs, then this couldn't be
easier: just install the appropriate package using Those less fortunate will have to download the latest version of SpamAssassin, and install it themselves. Vipul's RazorSpamAssassin uses a variety of ways for testing whether an e-mail is spam, ranging from simple textual checks on the headers or body and detecting missing or misleading headers to network-based checks and an interesting distributed system called Vipul's Razor. Vipul's Razor takes advantage of the fact that spam is, by its nature, distributed in bulk. Hence, a lot of the spam that you see, I'm also going to see at some point. If there were a big clearing-house where you could report spam and I could see if my incoming mail matches what you've already reported, then I could have a guaranteed way of determining whether a given mail is spam. Vipul's Razor is that clearing-house. Why is it a Razor? Because it's a collaborative system, its strength is directly derived from the quality of its database, which comes back to the way it's used by the likes of you and me. If end-users report lots of real spam, the Razor gets better; if the database gets "poisoned" by lots of false or misleading reports, then the efficiency of the whole system drops. Just like any other spam detection mechanism, Razor isn't perfect. There are two points particularly worth noting. First, while it tries to completely avoid false positives (saying something's spam when it isn't) by requiring that spam be reported, it doesn't do anything about false negatives (saying something's not spam when it is) because it only knows about the mail in its database. Second, spammers, like all other primitive organisms, are constantly evolving. Vipul's Razor only works for spam that is delivered in bulk without modification. Spam that is "personalized" by the addition of random spaces, letters or the name of the recipient, will produce a different signature that won't match similar spam messages in the Razor database.
Nevertheless, the Razor is an excellent addition to the spam fighter's
arsenal, since when it marks something as spam, you can be almost
positive it's correct. And just like SpamAssassin, it's all pure Perl.
Installing Vipul's Razor is similar to installing SpamAssassin. Debian and BSD users have packages called "razor" and "razor-clients," respectively; and the rest of the world can download and install from the home page. SpamAssassin will detect whether Razor is available and, by default, use it if so. Assassinating Spam With Mail::Audit : The Easy Way
So this is the part you've all been waiting for. How do we use these
things to trap spam? For those of you who aren't familiar with
For more details on how to construct mail filters with
Mail::Audit , see my
previous
article.
Plugging SpamAssassin into your filters couldn't be simpler. First of
all, you absolutely need the latest version of
As you might be able to guess, the important thing here is the calls to
check and is_spam . check produces
a "status object" that we can query and use to manipulate the e-mail.
is_spam tells us whether the mail has exceeded the number of
"spam points" required to flag an e-mail as spam.
The
This message had a question mark in the subject, an empty reply-to, and
the subject ended in a question mark. The mail wasn't actually spam, but
this goes to prove that the technique isn't perfect. Nevertheless, since
installing the spam filter, I've only seen about 10 false positives,
and zero false negatives. I'm happy enough with this solution.
One important point to remember, however, is where in the course of your filtering you should call SpamAssassin's checks. For instance, you want to do so after your mailing list filtering, because mail sent to mailing lists may have munged headers that might confuse SpamAssassin. However, this means that spam sent to mailing lists might slip through the net. Experiment, and find the best solution for your own e-mail patterns. Assassinating Spam Without Mail::Audit
Of course, there are times when it might not be suitable to use
For instance, here's a procmail recipe that calls out to
For the speed-conscious, you can run the spamd daemon and
replace calls to spamassassin with spamc ; be
aware that this is a TCP/IP daemon that you may want to firewall
from the rest of the world.
Another approach is to call Assassinating Spam With Mail::Audit : More Complex Operations
The
to report it to Vipul's Razor. (Take note of this: As we've mentioned
above, the efficiency of the Razor database comes from the fact that
e-mails in it are confirmed as spam by a human. Adding false positives to
the database would degrade its usefulness for everyone. Only submit mail
that you've confirmed personally.)
If you're finding that mail checking is taking too long because SpamAssassin is having to contact the various network-based blacklists and databases, then you can instruct it to only perform "local" checking:
There is a wealth of other options available. See the
Return to Related Articles from the O'Reilly Network .
Perl.com Compilation Copyright © 1998-2003 O'Reilly & Associates, Inc. |
|