- A general overview of what SpamAssassin does
- Information about viewing Spam Scores
- Training spamc for custom filtering
- Solutions to common problems with mail filtering and spam
SpamAssassin
This document contains
Related Pages
- Overview of the CS Mail Server Setup
- Setting up your .procmailrc and using the vacation program
General Information
SpamAssassin (http://spamassassin.apache.org/) is
a mail filter to identify spam email. It tests each message against a lengthy set of rules (see
below) that are common characteristics of spam and generates a rating representing the
likelihood that the message is spam. Once a rating is generated, that rating is included in the
email headers so that a filtering program can be used to flag the email, filter it into a special folder, or discard
it (see our procmail page to see how to do this).
What rules does it use to determine whether a message is spam?
SpamAssassin uses an extensive list of tests to test the likelihood that a message has been sent
by a spammer. A message's score increases if, for example, it is received via a relay in a
blacklisted domain (one that has been publicly recognized for generating spam), is forged
pretending to be from MS Outlook, contains explicit content, or if the body of the message
contains a significant amount of HTML code. On the other hand, a message's score decreases if
there is reason to suggest that it is a legitimate message, like quoted email text (indicative
of a reply), or if the sender uses a Unix-based email client like Pine.
For a complete list of SpamAssassin's tests, see
http://spamassassin.apache.org/tests_3_1_x.html.
Viewing Spam Score
We currently prepend "[Likely SPAM]" to messages with high SpamAssassin scores. The level at which
this string is applied is deliberately set high so as to be conservative in our declaration of
what is probably spam. This keeps legitimate messages from being mistakenly tagged.
To view the score for a particular message, full headers need to be enabled in your mail client of
choice. There should be three extra fields reporting SpamAssassin scoring information in the
header of the message: X-Spam-Status, X-Spam-Level, and X-Spam-Checker-Version
Listed below is an example of the headers that will show up on your message. There are two different
stages through which your mail is processed; each stage is shown below.
For a full explanation of our current mail system, please see our
mailing system documentation.
Example:
First Stage: after being processed by the milter on ares:
<snippet>
X-Spam-Status: HIGH ; 292
X-Spam-Level: *****************************++
</snippet>
Second Stage: after being processed with spamc:
<snippet>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on
archive.cs.Virginia.EDU
X-Spam-Level:
X-Spam-Status: No, score=-4.4 required=5.0 tests=ALL_TRUSTED,BAYES_00
autolearn=ham version=3.1.0
</snippet>
Now that you have your scoring in place you have to set up your .procmailrc to use the scoring
for filtering. See our procmail page for information on how to do
this.
Using spamc for custom filtering
Spamc is the client half of the spamc/spamd pair. It is used in place of SpamAssassin to process
mail as it has low-overhead and loads fairly quickly. It is also trainable, so the user can
help increase their amount of correctly marked spam through use of the custom filtering.
First, the user needs to have two mailboxes that they'd like to use for
training, ham and spam. For example, we have a collection of spam in the SpamTraining
mailbox and ham in HamTraining. Spam is the illegitimate email you DO NOT want to receive. Ham
is the email you DO want to receive.
Now to achieve the best results, it's desirable to strip off any spamassassin
markup, by running the following:
formail -s spamassassin -d < pathto/HamTraining > cleaned.ham.mailbox
formail -s spamassassin -d < pathto/SpamTraining > cleaned.spam.mailbox
Now it's time to train the spamassassin databases (this can take a while).
#Training HAM (not spam or false positives)
sa-learn --ham --dbpath=~/.spamassassin --showdots --mbox
cleaned.ham.mailbox
#Training SPAM (spam or false negatives)
sa-learn --spam --dbpath=~/.spamassassin --showdots --mbox
cleaned.spam.mailbox
The bayesian classifier used to
score messages can only do so if it already has 200 known spams and 200 known hams, so make sure
to run at least this many messages through in the above mailboxes.
Common Problems
Legitimate email keeps getting filtered out. What should I do?
Occasionally SpamAssassin will generate a false positive on a legitimate message, which is why we
recommend setting up filtering to move mail marked as spam to a separate folder as opposed to
deletion. Based on personal experience, SpamAssassin seems to be a bit trigger-happy when testing
for forged mail pretending to be from Microsoft Outlook; that is, mail actually sent by someone
using Outlook as a mail client is reported as forged, increasing those messages' scores.
SpamAssassin provides a whitelisting option that forces mail from addresses on a user's white list
which are reported as spam, to never be filtered out.
To add an address to your white list, simply create a file named ~.spamassassin/user_prefs in your
home directory, and include the following line for each address you wish to whitelist:
whitelist_from address@domain.com
How can I reduce the amount of SPAM sent out to my mailing lists?
Our page about SPAM on mailing lists is coming soon!
