SpamAssassin on MySQL
From GreyWiki
Contents |
Purpose
If we convert to MySQL, we get a few nifty things... not the least of which is the ability to interact with SA from a web interface. (For instance, we can use plugins in SquirrelMail to run sa-learn against a given message and to set preferences.) This is a somewhat non-trivial operation, but it's not actually complicated.
Server Prep
One problem is that DBD::mysql may not be happy, and installing it via CPAN is a major pain in the backside. Installing it manually involves some tests, so...
perl Makefile.PL --testuser=root --testpassword=<mysqlrootpwd> make make test make install (IF THE TESTS WORKED)
This should prevent SpamAssassin from, later on, throwing an error about DBI.pm, which should also have been updated. Right?
User Export And Import
Users will probably want their current Bayesian training data migrated over. I don't think I can describe this any better than the following excerpt from the sa-learn man file:
"Note that if you have individual user databases you will have to perform a similar procedure for each one of them."
sa-learn --sync
"This will sync any outstanding journal entries."
sa-learn --backup > backup.txt
"This will save all your Bayes data to a plain text file."
sa-learn --clear
"This is optional, but good to do to clear out the old database."
"At this point, if you have multiple databases, you should perform the procedure above for each of them. (i.e. each user's database needs to be backed up before continuing.)"
"Once you have backed up all databases you can update your configuration for the new database backend. This will involve at least the bayes_store_module config option and may involve some additional config options depending on what is required by the module. (For example, you may need to configure an SQL database.)"
sa-learn --restore backup.txt
"Again, you need to do this for every database."
"If you are migrating to SQL you can make use of the -u <username> option in sa-learn to populate each user's database. Otherwise, you must run sa-learn as the user who database you are restoring."
Please note that this section of the instructions should be followed in two parts! First make the backup, then make the SpamAssassin configuration changes shown below, and only then should you run the restore processes.
MySQL Database Prep
First, we should have the database prepared. Create the DB and grant rights to the user/password set that we plan to use. For this document, the database is named spamassassin. You're free to pick something else, but keep that in mind if you choose to copy/paste content from this document.
Now it's time for some table creation (copy/paste these blocks to text files and mysql -pPASSWORD DATABASE < TEXTFILE accordingly):
CREATE TABLE userpref ( username varchar(100) NOT NULL default '', preference varchar(30) NOT NULL default '', value varchar(100) NOT NULL default '', prefid int(11) NOT NULL auto_increment, PRIMARY KEY (prefid), KEY username (username) ) TYPE=MyISAM;
This gives us the ability to store per-user (including the @GLOBAL user) preferences. So, how about whitelist data?
CREATE TABLE awl ( username varchar(100) NOT NULL default '', email varchar(200) NOT NULL default '', ip varchar(10) NOT NULL default '', count int(11) default '0', totscore float default '0', PRIMARY KEY (username,email,ip) ) TYPE=MyISAM;
Huzzah, there's the Auto-WhiteList table. Now for the meat of the matter, the Bayes data. Hold on, this one's a doozy:
CREATE TABLE bayes_expire (
id int(11) NOT NULL default '0',
runtime int(11) NOT NULL default '0',
KEY bayes_expire_idx1 (id)
) TYPE=MyISAM;
CREATE TABLE bayes_global_vars (
variable varchar(30) NOT NULL default '',
value varchar(200) NOT NULL default '',
PRIMARY KEY (variable)
) TYPE=MyISAM;
INSERT INTO bayes_global_vars VALUES ('VERSION','3');
CREATE TABLE bayes_seen (
id int(11) NOT NULL default '0',
msgid varchar(200) binary NOT NULL default '',
flag char(1) NOT NULL default '',
PRIMARY KEY (id,msgid)
) TYPE=MyISAM;
CREATE TABLE bayes_token (
id int(11) NOT NULL default '0',
token char(5) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
atime int(11) NOT NULL default '0',
PRIMARY KEY (id, token),
INDEX bayes_token_idx1 (token),
INDEX bayes_token_idx2 (id, atime)
) TYPE=MyISAM;
CREATE TABLE bayes_vars (
id int(11) NOT NULL AUTO_INCREMENT,
username varchar(200) NOT NULL default '',
spam_count int(11) NOT NULL default '0',
ham_count int(11) NOT NULL default '0',
token_count int(11) NOT NULL default '0',
last_expire int(11) NOT NULL default '0',
last_atime_delta int(11) NOT NULL default '0',
last_expire_reduce int(11) NOT NULL default '0',
oldest_token_age int(11) NOT NULL default '2147483647',
newest_token_age int(11) NOT NULL default '0',
PRIMARY KEY (id),
UNIQUE bayes_vars_idx1 (username)
) TYPE=MyISAM;
Hefty, wot? Now, in order for tests to be run, we need some global prefs. (I found this out the hard way.) So:
# mysql -pPASSWORD spamassassin
INSERT INTO userpref (username,preference,value) VALUES ('$GLOBAL','required_hits','5.0');
INSERT INTO userpref (username,preference,value) VALUES ('$GLOBAL','report_safe','1');
INSERT INTO userpref (username,preference,value) VALUES ('$GLOBAL','use_bayes','1');
INSERT INTO userpref (username,preference,value) VALUES ('$GLOBAL','use_dcc','1');
exit;
Feel free to embellish with additional preferences. You get the idea. Note: I'm using report_safe in order to make my mail filtering scheme work. Your mileage may vary, yadda yadda.
SpamAssassin Configuration
Switching SA to SQL isn't difficult, just tedious. Here's the relevant chunk of local.cf for a database name of "spamassassin", with the passwords obfuscated appropriately:
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:spamassassin:localhost:3306
bayes_sql_username DBUSER
bayes_sql_password DBPASSWORD
auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn DBI:mysql:spamassassin:localhost:3306
user_awl_sql_username DBUSER
user_awl_sql_password DBPASSWORD
user_scores_dsn DBI:mysql:spamassassin:localhost:3306
user_scores_sql_username DBUSER
user_scores_sql_password DBPASSWORD
user_scores_sql_custom_query SELECT preference, value FROM _TABLE_
WHERE username = _USERNAME_
OR username = '$GLOBAL'
OR username = CONCAT('%',_DOMAIN_)
ORDER BY username ASC
Yes, I made that last line ugly on purpose. Make sure that's all on one line, okay? Also, replace those DBUSER and DBPASSWORD markers with the appropriate values.
We need to convince spamd to use the SQL database. Find wherever its launcher configuration lives (for instance, /etc/sysconfig/spamassassin on a Fedora box) and insert -q into the list of options. For instance:
SPAMDOPTIONS="-d -c -q -m1 -H"
The -m1, for the record, sets a one-child process limit to cut down on memory overhead. Unless you have lots of server RAM and/or high email traffic, this will probably work for you just fine.
SquirrelMail
I'm using the Spam Buttons and SASQL plugins for SquirrelMail. A few simple tips on how I'm using them follow, mostly for my own future reference.
Spam Buttons
The patch file provided in the tarball is against a specific release of 1.4.x, but since there's only one line I decided it's best just to edit the file in question manually. In the SquirrelMail's functions directory, edit mailbox_display.php as follows.
Search for:
getMbxList($imapConnection);
echo getButton('SUBMIT', 'moveButton',_("Move")) . "\n";
echo getButton('SUBMIT', 'attache',_("Forward")) . "\n";
Immediately after those lines, insert:
do_hook('mailbox_display_buttons');
Find the portion of Spam Buttons' config.php which details the shell command configuration. Here's what I'm using:
$is_spam_shell_command = '/usr/bin/sa-learn --spam --username=###USERNAME###'; $is_not_spam_shell_command = '/usr/bin/sa-learn --ham --username=###USERNAME###';
Please note the change from the example in config.php; the corresponding lines use the ###EMAIL_ADDRESS### flags, but the entire rest of this SpamAssassin SQL configuration is based on using just the username portion, hence my change.
For what it's worth, you may find it useful to set the debug flags during your initial tests.
(As a side note: This feature is what spurred my investigation into SpamAssassin's SQL capabilities in the first place.)
SASQL
In short:
- Since I'm already using a "Spam" subdirectory in my mailbox, I have the $special_spam_folder setting turned on and $spam_folder is set to 'Spam' (of course).
- However, I've turned off $sasql_use_learn because I'm using Spam Buttons to handle my filter training.
- Setting $learn_spam_folder and $learn_fp_folder to '' may have been unnecessary, but I'm happier being on the safe side.
Further Reading
Here's some of the reference material (outside of some random Google searches to clarify points specific to my server's configuration) for your perusal and edification:
http://wiki.apache.org/spamassassin/UsingSQL
http://wiki.apache.org/spamassassin/BetterDocumentation/SqlReadmeBayes
Migrate a Fedora Core 4 ... to MySQL
Please note that these documents include testing procedures that I gloss over entirely in my instructions!
