top curve

SpamAssassin

This document contains

Related Pages

General Information

SpamAssassin (http://spamassassin.apache.org/) is a mail filter to identify spam email. It tests each message against a lengthy set of rules (see below) that are common characteristics of spam and generates a rating representing the likelihood that the message is spam. Once a rating is generated, that rating is included in the email headers so that a filtering program can be used to flag the email, filter it into a special folder, or discard it (see our procmail page to see how to do this).

What rules does it use to determine whether a message is spam?

SpamAssassin uses an extensive list of tests to test the likelihood that a message has been sent by a spammer. A message's score increases if, for example, it is received via a relay in a blacklisted domain (one that has been publicly recognized for generating spam), is forged pretending to be from MS Outlook, contains explicit content, or if the body of the message contains a significant amount of HTML code. On the other hand, a message's score decreases if there is reason to suggest that it is a legitimate message, like quoted email text (indicative of a reply), or if the sender uses a Unix-based email client like Pine.
For a complete list of SpamAssassin's tests, see http://spamassassin.apache.org/tests_3_1_x.html.

Viewing Spam Score

We currently prepend "[Likely SPAM]" to messages with high SpamAssassin scores. The level at which this string is applied is deliberately set high so as to be conservative in our declaration of what is probably spam. This keeps legitimate messages from being mistakenly tagged.
To view the score for a particular message, full headers need to be enabled in your mail client of choice. There should be three extra fields reporting SpamAssassin scoring information in the header of the message: X-Spam-Status, X-Spam-Level, and X-Spam-Checker-Version
Listed below is an example of the headers that will show up on your message. There are two different stages through which your mail is processed; each stage is shown below. For a full explanation of our current mail system, please see our mailing system documentation.

Example:

First Stage: after being processed by the milter on ares:
<snippet> X-Spam-Status: HIGH ; 292 X-Spam-Level: *****************************++ </snippet>
Second Stage: after being processed with spamc:
<snippet> X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on archive.cs.Virginia.EDU X-Spam-Level: X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.1.0 </snippet>
Now that you have your scoring in place you have to set up your .procmailrc to use the scoring for filtering. See our procmail page for information on how to do this.

Using spamc for custom filtering

Spamc is the client half of the spamc/spamd pair. It is used in place of SpamAssassin to process mail as it has low-overhead and loads fairly quickly. It is also trainable, so the user can help increase their amount of correctly marked spam through use of the custom filtering.
First, the user needs to have two mailboxes that they'd like to use for training, ham and spam. For example, we have a collection of spam in the SpamTraining mailbox and ham in HamTraining. Spam is the illegitimate email you DO NOT want to receive. Ham is the email you DO want to receive.
Now to achieve the best results, it's desirable to strip off any spamassassin markup, by running the following:
formail -s spamassassin -d < pathto/HamTraining > cleaned.ham.mailbox formail -s spamassassin -d < pathto/SpamTraining > cleaned.spam.mailbox
Now it's time to train the spamassassin databases (this can take a while).
#Training HAM (not spam or false positives) sa-learn --ham --dbpath=~/.spamassassin --showdots --mbox cleaned.ham.mailbox #Training SPAM (spam or false negatives) sa-learn --spam --dbpath=~/.spamassassin --showdots --mbox cleaned.spam.mailbox
The bayesian classifier used to score messages can only do so if it already has 200 known spams and 200 known hams, so make sure to run at least this many messages through in the above mailboxes.

Common Problems

Legitimate email keeps getting filtered out. What should I do?

Occasionally SpamAssassin will generate a false positive on a legitimate message, which is why we recommend setting up filtering to move mail marked as spam to a separate folder as opposed to deletion. Based on personal experience, SpamAssassin seems to be a bit trigger-happy when testing for forged mail pretending to be from Microsoft Outlook; that is, mail actually sent by someone using Outlook as a mail client is reported as forged, increasing those messages' scores.
SpamAssassin provides a whitelisting option that forces mail from addresses on a user's white list which are reported as spam, to never be filtered out.
To add an address to your white list, simply create a file named ~.spamassassin/user_prefs in your home directory, and include the following line for each address you wish to whitelist:
whitelist_from address@domain.com

How can I reduce the amount of SPAM sent out to my mailing lists?

Our page about SPAM on mailing lists is coming soon!