Spamfilters

cracauer@

Developer
For those not relying on a big mail provider:

What's your spam filtering like and how well does it work?


Myself I started out with Spamassassin long ago. As spam started coming through I added custom filters in procmail, and also a complete perl program. I have many years worth of that tuning on top of SA and it works pretty well. I recently had a hickup where SA didn't work for a while and there was a lot of spam coming through. Spamassassign is definitely not as useless by now as some people claim. Not enough on its own, but it carries its weight. In addition I redirect all HTML-only mail to a rarer read mailbox. That's really useful, most people who don't send ascii, too, are dorks of lesser importance.

I still know people on greylisting, but it's not for me.

What's yours like?
 
To all the folks dissing spamassassin, do you have bayesian autolearn enabled and do you feed it and train it? I feed mine every month with spam signatures from here.. https://untroubled.org/spam/

Code:
sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      58379          0  non-token data: nspam
0.000          0       7139          0  non-token data: nham
 
To all the folks dissing spamassassin, do you have bayesian autolearn enabled and do you feed it and train it? I feed mine every month with spam signatures from here.. https://untroubled.org/spam/

Code:
sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0      58379          0  non-token data: nspam
0.000          0       7139          0  non-token data: nham

I suppose this looks OK?

Code:
% sa-learn --dump magic
plugin: failed to parse plugin (from @INC): Can't locate Mail/SpamAssassin/Plugin/FuzzyOcr.pm in @INC (you may need to install the Mail::SpamAssassin::Plugin::FuzzyOcr module) (@INC contains: /usr/local/lib/perl5/site_perl /usr/local/lib/perl5/site_perl/mach/5.36 /usr/local/lib/perl5/5.36/mach /usr/local/lib/perl5/5.36) at (eval 1210) line 1.

0.000          0          3          0  non-token data: bayes db version
0.000          0      69639          0  non-token data: nspam
0.000          0     934675          0  non-token data: nham
0.000          0     217792          0  non-token data: ntokens
0.000          0 1722978215          0  non-token data: oldest atime
0.000          0 1743453505          0  non-token data: newest atime
0.000          0 1743444877          0  non-token data: last journal sync atime
0.000          0 1743431465          0  non-token data: last expiry atime
0.000          0    1382400          0  non-token data: last expire atime delta
0.000          0      13514          0  non-token data: last expire reduction count
 
Apart from the failed plugin, yeah. Well, unless a lot of your ham is actually spam, seems like a lot of mail, bearing in mind mine has only been running since 2020 and there's only 14 addresses being protected. Maybe I just don't get a lot of mail, circa 120 a day according to pflogsumm.
 
I recently switched from qmail/spamdyke to opensmtpd. Couldn't be happier. Almost no spam gets through (maybe 1-2 / month). Just using the native utils that come with opensmtpd. Highly recommended.
 
have been using spamassassin since the 2000s but decided to switch to a combination of rspamd + spamd (which is entirely optional) an year ago. and I'm very happy with the switch.

rspamd is much more efficient with RAM usage and comes with most things (dkim signing module, ARC, DMARC/SPF, greylisting, great web interface for stats to name just a few) - you don't need any external perl modules. it feels it's got less 'moving parts'.

my setup works like this:

email from IPs not covered by a pf blacklist reach postfix and rspamd. rspamd has a large number of modules one can switch on or off. in my case it does greylisting only on emails that are very close to the uncertainty score.

procmail further redirects mailing lists to separate maildirs, deletes mails with more than 5.0 spam score and places any mail with a score > 2.0 into a dedicated 'probably spam' maildir.

in pf.blacklist I have a few subnets of mostly russian ISPs that keep sending legit looking crap in my native language (romanian), these would always get past most antispam techniques both in SA and rspamd. connections from these hosts get a special tarpit treatement by getting redirected to a port where spamd is listening.

what I like about this setup compared to SA:
- really lightweight, easier to setup and maintain
- in my case I made it less dependant on RBL - these servers have a tendency of disappearing over the years
- mail does not get clogged due to sa-update not being run every once a decade
- 5/5 star web interface with stats, mail log/history with detailed score printout for everything rspamd has touched. can easily be integrated with apache/nginx.
- using this log I also detected a misconfiguration on my server. I kept seeing a few points to all incoming email due to RCVD_NO_TLS_LAST since my postfix was accidentally not announcing STARTTLS to all incoming connections.

my stats for the last 2 months:
500 mails incoming - 48% clean, no action; 37% rejected; 13% greylisted; the rest get a score header. no legit mail was deleted, 2 or 3 spams got thru.
 
cracauer@ OK, got fuzzyocr going with no errors, you'll need FuzzrOcr from HEAD to alleviate the gifinter error, then tesseract to alleviate another error that throws later when you try to restart spamassassin.
 
Spoke too soon... Mail amavis[84121]: (84121-08) SA warn: FuzzyOcr: Skipping ocrad, invalid command '$ocrad'
 
I've been using a combination of mail/spamd and mail/rspamd for several years now, but I also used spamassassin for many years.

The former one needs to be fed with an extensive whitelist of "known stupid" mailproviders that don't adhere to the RFCs and re-send from different IP addresses, otherwise those will never pass greylisting. Other than that, it can be pretty effective and fends off a large amount of typical 'one off' spamservers.
TBH - in minimal configuration just running on its own without any other input, greylisting more often gets in the way of legitimate mail than most people are willing to tolerate. But if you feed the black- and whitelists with some pf-rules and scripts, it becomes a valuable tool in the chain. E.g. any server we sent mail to more than once is whitelisted, prefixes from some notorious spamhosting providers like OVH are regularly updated (whois.radb.net is great for this!) and added to the blacklist etc...


rspamd on my servers usually runs with rbl, clamav, bayes, neural and some other enabled modules depending on the use-case and number of domains/users on the server and also performs quite well, even in larger setups.
Dovecot/sieve also passes mail for ham/spam learning to rspamd when users move mail to or out of their Junk folders for training - this is IMHO the most important and valuable source of training data. There are some example configurations on github or in various blogs for this - searching for "dovecot rspamd learn spam" will yield enough results to get an idea of the mechanism so one can adapt it to their needs (as I did).
rspamd also handles (among other things) the dkim-signing of outgoing and checking of incoming mail, and sadly also needs an ever-increasing whitelist of maildomains managed by incompetent inbreds that are worse at sending mail than 99% of all spammers out there...
(e.g. a subsidiary of a large international car manufacturer regularly sending mails for domain abc.com, mailserver in domain abc-def.com, headers have <from> def.com and <reply-to> def-abc.com, their mails often only consist of a large image and a link to abc.net and they never heard of SPF or DKIM - their mails repeatedly reach the highest-ever scorings I've ever seen...)

rspamd is a beast to configure though if you want to really make use of its features. It's basically a pile of various filters, scripts and programs thrown together and this shows at various places - especially the hideously scattered configurations and their wildly varying syntax. Other than that, combining multiple methods to obtain a 'large picture' IMHO is the right approach and it works, so it pays to bite through the initial config and improve it over time. The basic setup and enabled filters is already a good starting point, so there's not that much head-butting involved in just getting it to run. Once you have a good working rspamd setup and training database, you should hold on to it with dear life and back it up with your other "most valuable data". (although, for the training data that's only until they again make breaking changes to the database format and their migration script doesn't work...)

As freezmi already pointed out: the web interface for rspamd is quite nice - a lot of statistics and a browseable log with detailed information on the scoring of all mail that came through.

MWL did a pretty good job with explaining rspamd, its configuraiton and training via dovecot/sieve (and pretty much everything else regarding a secure, modern mail server) in his book "Run Your Own Mailserver". I can highly recommend that book! I've been running mailservers for 20+ years now and still found valuable information and new viewpoints in it.
 
Back
Top