Spam Detection Using SpamAssassin with PYTHEAS MailGate
This page now includes instructions how to install SpamAssassin release
3.4.1.
Upgrade instructions are here.
SpamAssassin (tm) is an open source product that performs
heuristic spam analysis and RBL (Realtime Blackhole List) lookups among other tests,
to clearly tag spam mail as such. PYTHEAS MailGate can then be instructed
to handle spam mail in a particular way.
SpamAssassin (tm) is open source software, licensed
under the Apache Software License (which you can find at
http://www.apache.org/foundation/licence-FAQ.html).
No guarantees or warranties apply to the software. You use it entirely at your own
risk.
Neither SpamAssassin nor the software components it
requires are installed by the PYTHEAS MailGate setup program. Please note
that you need a PYTHEAS MailGate license key which activates the
Content-Checking Rules engine; see the
About tab to learn about
the options activated by your license key.
In its default form, SpamAssassin is designed and written
for Unix platforms. This document outlines how to get SpamAssassin
working on a Windows platform. Although it may seem a little
bit cumbersome at first glance, we are sure that you will recognize that it is worth
the trouble - it has an amazing efficiency.
Upgrading SpamAssassin
If you are doing a fresh install, you can skip this section.
Upgrading a SpamAssassin v. 3.x Installation
For the time of the upgrade, you should
stop the Pytheas.MailGate service (or the Communication Task). To upgrade to a newer version of SpamAssassin:
- If you upgrade from SpamAssassin
2.x to 3.x, be sure to read these notes first.
- Uninstall ActivePerl. Then delete the whole
c:\perl subtree.
Be sure not to delete the c:\etc\mail\spamassassin folder. You
may also want to move the NMAKE utilitiy
from C:\perl\bin to some safe place.
- Be sure to get the new
SpamAssassin support files. The
sa.cmd file required for SpamAssassin v.3.4.1
is different from the one included in the package for earlier (pre 3.3.0) versions of
SpamAssassin. Please copy DOS2UNIX.EXE und
UNIX2DOS.EXE to the folder
where PYTHEAS MailGate has been installed.
- Your configuration file
pmg-local.cf
may contain options which are no longer supported in the new version.
Carefully read the beginning of spamdebug.txt when checking your
new SpamAssassin installation later.
- Proceed the same way as you would for a fresh installation, starting from here.
Installing Perl
Check that you have the latest version of the
SpamAssassin support files. If not,
download and unzip.
- Install ActivePerl (v. 5.8.8.822). Keep the features Perl
and PPM selected. You may unselect the features Perl ISAPI,
PerlEx, PerlScript, Documentation et
Exemples.
- Open a Command-Line window and type
PERL -v to check that everything
is fine.
- In subsequent sections, it will be assumed that Perl has been installed in
C:\PERL . Make appropriate changes if necessary.
- Reboot the computer. If Perl already had been installed on your
computer, and the PATH environment variables already had been defined, for ex.
during an upgrade, you may skip the reboot. After rebooting, open a command line window, and type
PATH
to make sure that C:\PERL\BIN is now part of your PATH
environment variable.
Installing NMAKE
- Download NMAKE.
- Extract the files, and place them in
C:\PERL\BIN . Both
NMAKE.EXE and NMAKE.ERR are needed.
Installing the Necessary Perl Modules
Perl uses modules to extend the language's capabilities. Many of them are included
with the core distribution, but many others are available. SpamAssassin
requires several modules which are not in the core distribution of ActivePerl.
Obtaining and Installing SpamAssassin
- Be sure to have PYTHEAS MailGate v. 2.32a (or a newer version).
Upgrade if necessary.
- Go to
http://spamassassin.apache.org/downloads.html,
and download the ZIP file distribution. Extract the Zip file off the root. For
SpamAssassin version
3.4.1 for
example, this will create C:\Mail-SpamAssassin-3.4.1
or C:\Mail-SpamAssassin-3.4.1\Mail-SpamAssassin-3.4.1 ,
depending on how you proceed. We'll refer to this folder as the SPAMSOURCE
folder in subsequent sections.
- Open a command-line window (an elevated command line window on Windows
Server 2008 and later), go to the SPAMSOURCE folder and type:
PERL MAKEFILE.PL You will be asked a couple of questions. Be sure to answer
No to
the first one, which is not the default response: First question:
Build spamc.exe (...)? Answer: N Next question:
What email address or URL should be used (...) Answer: give a meaningful answer for your site. You may safely ignore the warnings about optional missing modules:
(...) optional module missing: Razor2 optional module missing: Net::Ident optional module missing: IO::Socket::INET6 optional module missing: IO::Socket::SSL (...)
- Still in the SPAMSOURCE folder, type:
NMAKE NMAKE INSTALL
- Make a backup copy of
c:\perl\site\etc\mail\spamassassin\v310.pre
(name it v310.backup for ex.; in any case, don't give it the
.pre extension). Open the file c:\perl\site\etc\mail\spamassassin\v310.pre
in a text editor (Wordpad.exe will handle the line endings better
than Notepad.exe). At the beginning of the lines
loadplugin Mail::SpamAssassin::Plugin::Pyzor
loadplugin Mail::SpamAssassin::Plugin::Razor2
add the character # to transform them into a comment and avoid
loading the plug-ins.
- Finally type:
C:\Perl\Site\Bin\SpamAssassin -V
You should get the following response:
SpamAssassin version
3.4.1
running on Perl version 5.8.8
- Download the SpamAssassin rules:
C:\Perl\Site\Bin\sa-update --nogpg -v
Using the --nogpg option works even if you do not have gpg installed. This should run without an error message.
We recommend to run this command regularly (once a week, for ex.) to keep
the SpamAssassin rules up to date.
Testing Your SpamAssassin Installation
From a command line window, in the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-nonspam.txt 2>spamdebug.txt
This command should run smoothly. In the command line window, you will get the
message after it passed through SpamAssassin. The output should indicate that this
sample message is not spam - look at the X-Spam-... lines added by
SpamAssassin in
the header part of the message.
Please note: it may happen that the file spamassassin.bat is not
created in the c:\perl\site\bin folder, but in the c:\perl\bin
folder. In this case please adjust the suggested commands in the subsequent
chapters.
Have a look at spamdebug.txt which has been created by this run.
Check for DNS resolution. In the Received header parsing part of it,
you should see:
dbg: dns: servers obtained from Net::DNS : [...]:53
dbg: dns: nameservers set to ...
(...)
dbg: dns: is Net::DNS::Resolver available? yes
At the end of the file, check for the results:
dbg: check: is spam? score=0 required=5
dbg: check: tests=
dbg: check: subtests=__CT,__CTYPE_CHARSET_QUOTED, __CT_TEXT_PLAIN, __DOS_BODY_STOCK, __DOS_BODY_SUN, __DOS_HAS_ANY_URI, __DOS_LINK, __DOS_RCVD_FRI, __FB_PICK, __FB_S_STOCK, __FM_STOCK_WORDS, __HAS_ANY_EMAIL, __HAS_ANY_URI, __HAS_MSGID, __HAS_RCVD, __HAS_SUBJECT, __LAST_UNTRUSTED_RELAY_NO_AUTH, __MIME_VERSION, __MISSING_REF, __MSOE_MID_WRONG_CASE, __NAKED_TO, __NONEMPTY_BODY, __RCVD_IN_SORBS, __RCVD_IN_ZEN, __SANE_MSGID, __TOCC_EXISTS, __YOUR_ACCOUNT
Now let's check if a message is correctly identified as spam. From the SPAMSOURCE folder, type:
c:\perl\site\bin\spamassassin -D < sample-spam.txt 2>spamdebug.txt
The output in the command line window should indicate that this sample message
is spam (look at the X-Spam-... lines added by SpamAssassin
in the header part of the message, and the body of the message which has been modified
by SpamAssassin).
Have a look at spamdebug.txt. At the end of the file, check for the results:
dbg: check: is spam? score=999.998 required=5
dbg: check: tests=GTUBE,NO_RECEIVED,NO_RELAYS
dbg: check: subtests=__CT,__CTE,__CT_TEXT_PLAIN,__HAS_MSGID,__HAS_SUBJECT, __MIME_VERSION, __MISSING_REF, __MSGID_OK_HOST, __NONEMPTY_BODY, __SANE_MSGID, __TOCC_EXISTS, __UNUSABLE_MSGID
The Online Documentation
You can access the documentation at
http://spamassassin.apache.org/full/3.3.x/dist/doc/. The most
important file to read is Mail Spamassassin Conf - it outlines all major
configuration parameters.
Connect SpamAssassin and PYTHEAS MailGate
If you are upgrading, you are now ready to restart PYTHEAS MailGate.
If you
do not have a pmg-local.cf file, copy this file from the
SpamAssassin support files to C:\etc\mail\spamassassin . Create this
folder if it does not exist. Use this file to configure the way SpamAssassin
should work for your site. You should not edit global configuration files in
C:\perl\site\share\spamassassin as your settings could be lost during
the next upgrade. Of course, it is a good idea to look at the global configuration
files to know what parameters can be changed.
Please note: For PYTHEAS MailGate v. 2.75c and earlier, on Microsoft Windows
Server 2012, please avoid folder names containing spaces for temporary storage
of incoming messages.
Copy the files sa.cmd, DOS2UNIX.EXE et UNIX2DOS.EXE to the C:\Program Files\PytheasMailgate
folder. The downloadable version of the file assumes that Perl has
been installed in the C:\perl folder.
Please note that we do
not really need DOS2UNIX.EXE and UNIX2DOS.EXE for the current version of
SpamAssassin, but it may be useful for future versions.
Here are some comments about
the contents of sa.cmd :
-D |
Instructs SpamAssassin to produce diagnostic output (see below). You may change this option to obtain different diagnostic output.
You can also omit this parameter altogether, if you do not need it. |
-e |
Instructs SpamAssassin to set the exit code depending
on the spam status. PYTHEAS MailGate uses this exit code to pick up the
spam status. |
-p ... |
Instructs SpamAssassin to use the Pmg-local.cf
file, regardless of the user context in which it is running. |
%1, %2, %3, %4 |
PYTHEAS MailGate will always call sa.cmd
with 4 parameters. Please see details below. |
%1 |
Path name of the file containing the message to be checked. |
%2 |
Path name of the file to contain the checked message (this is
always Temp_folder\PmgSaChki.tmp , i
being a number from 1 to 12). |
%3 |
Path name of the file to contain the diagnostic output produced
by SpamAssassin (this is always Temp_folder\PmgSpamAi.log,
i being a number from 1 to 12). |
%4 |
Determined by the POP3 account configuration
in PYTHEAS MailGate. Note: the downloadable
version of sa.cmd includes a code to handle the value
NoSpamCheck for this parameter, which does what its name suggests: if
you add Spam-A:NoSpamCheck to the Comment of a
POP3 account, it will be excluded from spam checking.
|
Exit code or Errorlevel |
Since v. 2.31c,
PYTHEAS
MailGate no longer relies on the exit code (or Errorlevel
value) of the sa.cmd command file, as with previous versions. |
To check your installation, you may use sapmg.cmd from the
SpamAssassin support files. This
command file calls SpamAssassin the same way PYTHEAS MailGate
does. You will find the message which has been checked by SpamAssassin,
and the diagnostic output spamdebug.txt , in the folder referenced
by the TEMP environment variable (use the SET command
to show environment variables).
Test it
If you activate spam-checking for the first time, you may want to activate it
for a single POP3 account only, with the following options:
- Check incoming mail with SpamAssassin... Only from POP3 accounts with
the word Spam-A in the comment. Put the word Spam-A into the
Comment field of the POP3 account entry.
- Forward messages identified as Spam to... The intended Recipient as
usual
After messages have been spam-checked, look for the following lines In the Remote Control Program
or in the Session Log message:
[11:16] [Spamassassin] Spam status: No, score=-4.9 required=5.0 tests=BAYES_00
autolearn=ham version=3.4.1
or
[11:06] *** [Spamassassin] Spam status: Yes, score=8.8 required=5.0
tests=BAYES_99, BIZ_TLD, HTML_60_70, HTML_MESSAGE, HTML_TITLE_UNTITLED, HTTP_EXCESSIVE_ESCAPES,
MIME_BASE64_TEXT, MIME_HTML_NO_CHARSET, MIME_HTML_ONLY autolearn=no version=3.4.1
In case you have problems:
- Please have a look at
PmgSpamAn.log or at PmgSaChkn.tmp
(you will need to make a copy of these file while the download session is still
in progress, as they will be deleted upon termination). You will find these files
in the folder you specified on the Service Options
page, Incoming mail tab (in v. 2.x:
on the Environment tab of the Configuration
Program).
- Did you really restart the computer since you installed Perl for the first
time?
- Did you check the paths in
sa.cmd ?
- Did you create the
C:\etc\mail\SpamAssassin folder? Did you
put your copy of pmg-local.cf there?
Cleaning up
The SPAMSOURCE folder is no longer needed once the installation
is completed.
Setting Spam Delivery Options in PYTHEAS MailGate
You have the following options for the delivery of messages which have been identified
as spam:
- deliver as usual (please note that the spam will have been tagged as such
by SpamAssassin),
- always deliver to a particular Recipient
- do not deliver to anybody. If you have configured to write a log entry for
every incoming message, messages identified as spam are logged even if they are
actually not forwarded to any internal Recipient at all. Such messages
receive a [Spam] tag at the beginning of the message subject.
- Messages with a spam score above a certain level can be handled in a different
way, as compared to spam messages with a spam score below this level.
Specific Configuration Settings for POP3 Accounts
You can activate spam analysis for all POP3 accounts, or only for selected ones.
The Comments field in the POP3 Account properties is used for this
purpose.
To activate spam detection only for certain POP3 accounts, configure the corresponding
option in the PYTHEAS MailGate configuration (see screen shot above), and type the word Spam-A
anywhere as a separate word into the Comment field of the selected
POP3 accounts.
To use specific SpamAssassin configuration settings for POP3 accounts,
proceed as follows:
- Put the following expression into the Comment field of each POP3
Account entry:
Spam-A:ConfigTag .
ConfigTag is some identifier (only composed of letters and numbers). It
will be passed as 4th parameter to sa.cmd .
- You can now write code in
sa.cmd to switch to different configuration
files, based on this parameter.
- If for a particular POP3 account, no ConfigTag value is found
in the Comments field, the word Nothing is passed as
4th parameter (so you can be sure that your
sa.cmd file always gets
4 parameters).
- The
sa.cmd file included in the
SpamAssassin support files
contains code to handle the ConfigTag value of NoSpamCheck ,
to exclude a particular POP3 account from spam checking.
Spam/Ham Learning for SpamAssassin
For spam/ham learning with sa-learn, messages
are needed in text format according to RFC822, with the complete message header
lines. Unfortunately, there does not seem to be an easy way to save messages in
such a format using Microsoft Outlook.
How to save incoming messages to files in RFC822 format
PYTHEAS MailGate v. 2.30c (or later) supports a new way to write messages to disk
files in RFC822 format. This new function is managed by a tag in the
Comment field of POP3 account entries. The name of the tag is SaveToDisk ,
and it has two parameters, which are separated by a vertical bar (ASCII_124):
- a name for a folder (which will be created if it does not exist). Messages
will be saved to this folder. It will be located in
ProgramData\PytheasMailgate\Incoming or Program_Files\PytheasMailgate\Incoming
(depending on where your PMailGat.INI configuration file is
located);
- an age limit (in hours). Any files in this folder older than the age limit
will
automatically be deleted. An age limit of 0 (zero) will disable automatic
cleaning.
As an example, adding the expression SaveToDisk:SpamHam|24 to
the Comment field of a POP3 account entry will save all
messages from this POP3 mailbox to the Incoming\SpamHam subfolder
of the folder where the PYTHEAS MailGate configuration files are stored,
and any file older than 24 hours in this folder will be cleaned out at the
beginning of the upcoming download
session. Message delivery will continue as usual. Several POP3 mailboxes can
have their messages dropped into the same folder.
Another way to obtain messages in RFC822 format is to use the View/Delete messages function (accessible from the POP3 account
property page). It has a Save message as-function (press F10 to access
it). You should also configure PYTHEAS
MailGate not to delete messages after downloading them, and clean them
after a day or two. So you can get messages in RFC822 format directly from the POP3
account. With this method, you can also get the messages to teach the Bayes engine
with messages for which it does not yield the correct result.
To streamline the process, you could do the following:
- Set up a folder structure as described in the
SpamAssassin support files package.
- Make shortcuts on the desktop for the programs
LearnHam.cmd and
LearnSpam.cmd , and the folders SpamTest\Ham and
SpamTest\Spam .
Now the learning procedure could look like this:
- If you configured your POP3 account to have the messages saved to files by
using the
SaveToDisk option (see above), open the
...\Incoming\... folder. Drag-and-drop the messages to the
SpamTest\Spam or SpamTest\Ham shortcut.
- Alternatively, you can save the message to feed into the learning process on the desktop (View/Delete
messages, F10, Save message as). Then drag-and-drop the file to the shortcut pointing to the
SpamTest\Spam or SpamTest\Ham
folder.
- Double-click on the shortcut for
LernSpam.cmd or LernHam.cmd
(this will feed
all files contained in this folder into sa-learn).
Additional instructions for upgrading from SpamAssassin
2.x
- Before
installing a 3.x version of SpamAssassin over a 2.x version, you should
put your Bayes database into a "clean" state:
from a command line prompt, execute:
sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --rebuild
- Clean the
c:\etc\mail\spamassassin folder: leave only pmg-local.cf
and the bayesdb subfolder and its contents; delete all the other
files.
- After installing the 3.x version of SpamAssassin: From a command line prompt, execute...
c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf --sync
followed by
c:\perl\site\bin\sa-learn -p c:\etc\mail\spamassassin\pmg-local.cf -D --import
to migrate the data into new DB_File format. Be patient, these commands may take
a couple of minutes to complete, depending on the size of your Bayes database.
- Check that the new version of SpamAssassin works on your machine
(we recommend to use the
spam-a.cmd command file included
in the
SpamAssassin support files for this
purpose, because it includes a reference to your pmg-local.cf
preferences file, which in turn contains the pointer to your Bayes database
in c:\etc\mail\spamassassin\bayesdb ). Look in the debug output for
configuration options in pmg-local.cf which may be no longer supported
or which have a new syntax. You may want to compare your configuration file to
the sample pmg-local.cf file contained in the
SpamAssassin support files.
More Information
Credits
This document has been inspired by USING SpamAssassin WITH
WIN32, (c) 2002,2004 by Michael Bell (thanks!).
SpamAssassin is a trademark of the Apache Software Foundation.
|