Improving Spam Identification & Filtering

Posted on April 2, 2021
Freelance Blog
Email, Service Changes, Transparency

If you’re anything like me, the topic of spam and the spammers sending it all – trying to take advantage of people who may not know better – will make your blood boil. I’m talking about the real spam, the spam that is sent to encourage you to buy something from an insecure website where they intend to steal your credit card details, the spam that is sent to encourage you to update your contact details in a banking website where they want you to go to a fake (but real-looking) website pretending to be your bank where they will then collect your banking details and can effectively steal your identity, the spam that has no good intentions at all and serves to try and take advantage of you or any recipient of their spam. This blog post will discuss the improvements made over the past month or two in order to better protect you from spammers.

What can you do to improve spam filtering for your unique mailbox?

Thankfully there’s almost nothing you need to do to take advantage of the spam filtering improvements, as that is all server-side and detailed out later in this post. I wanted to highlight two items though which each applies uniquely to each mailbox (rather than server-wide):

If the message is actual spam, make sure to move it to the Spam (or Junk) folder created automatically in each mailbox. This folder is used by SpamAssassin to track unique rules (known as BAYES) to your mailbox alone. This means if the mail server lets a spam message slip through to your inbox, the best thing for you to do to prevent that from hitting the inbox in the future is to move the message to the Spam folder.
Conversely, if the message was classified as Spam (i.e. found in the Spam folder) when it actually should have been delivered to your inbox (legitimate mail that we call “ham”), then move it out of the Spam folder to the Inbox or Trash or Archive folders (essentially any folder that isn’t marked for Spam). What this does is continue to train SpamAssassin and it’s BAYES scoring system for you, so that it will learn to mark that as ham in the future instead of spam.

By following the two items above, you will help the mail server learn what is actual spam and what is ham for your mailbox alone (it does not impact other users’ mailboxes). If you trash your spam messages that arrive in the inbox instead of moving them to the Spam folder, that action unfortunately works against you because it marks items in the Trash folder (and Inbox and Archive folders) as ham. So be sure to move the messages to their appropriate folders.

Server-side Spam Filtering Improvements

These types of improvements are all done server-side and nothing that needs to happen on your side at all.

The improvements were made all over the course of about the past 45 days, spending a lot of time looking at the mail server logs in order to get an idea if the improvements were working or not, or if I had to tweak any changes to further improve them. It was a painstaking process but it’s always worth it to better protect my clients, and more importantly to give the spammers less chance of anyone being baited in to their malicious behaviour.

The improvements include (but definitely not limited to):

Outright rejecting the most obvious spam the moment it's received so it doesn't even arrive in the mailbox to begin with.
Geek speak: I added more DNSBLs (DNS Blocklists) at the SMTP layer which serve to check more blocklists for known IPs which are sending spam. I'll write a dedicated post later on the use of DNSBLs.
Many (many!) SpamAssassin scoring tweaks from their defaults to be a bit more aggressive overall and matches today's spam-sending techniques.
Geek speak: SpamAssassin is the most popular (and open-source too!) spam filtering software on the market today. It is highly customizable which allows for almost infinite fine-tuning capabilities. I spent weeks tweaking the default rules to be more aggressive (but not too aggress in an effort to avoid false-positives), to catch more spam. When reviewing the logs, it seemed like a lot of spam messages were going to the inboxes of my clients, and now this should be more likely to end up in the spam/junk folder as expected.
Many custom/new SpamAssassin rules were added to check more sources for bad URLs or bad email addresses known for sending spam so they can be better identified as spam which then gets filtered more effectively.
Geek speak: I created several new rules which go and check various respected sources online to better identify if a message is likely spam. As an example of just a few of the rules I created in SpamAssassin:

				
					# add GDUB TRUNCATE DNSBL
header RCVD_IN_GBUDB eval:check_rbl('gbudb', 'truncate.gbudb.net.')
describe RCVD_IN_GBUDB Listed in truncate.gbudb.net
tflags RCVD_IN_GBUDB net

# add JMF-Black DNSBL
header RCVD_IN_JMF_BL eval:check_rbl('jmfbl', 'black.junkemailfilter.com.')
describe RCVD_IN_JMF_BL Listed in black.junkemailfilter.com
tflags RCVD_IN_JMF_BL net

# add Spamrats DNSBL
header RCVD_IN_SPAMRATS eval:check_rbl('spamrats', 'all.spamrats.com.')
describe RCVD_IN_SPAMRATS Sender listed in all.spamrats.com
tflags RCVD_IN_SPAMRATS net

# add SpamEatingMonkey DNSBL
header RCVD_IN_SEM_NET_BLACK eval:check_rbl('sem', 'netbl.spameatingmonkey.net')
tflags RCVD_IN_SEM_NET_BLACK net
describe RCVD_IN_SEM_NET_BLACK Received from an IP listed by SpamEatingMonkeys

# add second SpamEatingMonkey DNSBL
header RCVD_IN_SEM_BLACK eval:check_rbl('sem', 'bl.spameatingmonkey.net')
tflags RCVD_IN_SEM_BLACK net
describe RCVD_IN_SEM_BLACK Received from an IP listed by SpamEatingMonkeys

# add SpamEatingMonkey URIBL
urirhssub URIBL_SEM uribl.spameatingmonkey.net. A 2
body URIBL_SEM eval:check_uridnsbl('URIBL_SEM')
describe URIBL_SEM Contains a URI listed by SpamEatingMonkeys
tflags URIBL_SEM net

# add second SpamEatingMonkey URIBL
urirhssub URIBL_SEM_FRESH30 fresh30.spameatingmonkey.net. A 2
body URIBL_SEM_FRESH30 eval:check_uridnsbl('URIBL_SEM_FRESH30')
describe URIBL_SEM_FRESH30 Contains a domain registered less than 30 days ago
tflags URIBL_SEM_FRESH30 net

The current scores for over 160 different SpamAsassin rules that I’ve been tweaking over the past month and a half:

				
					# scoring DNSBLs (blocklists & allowlists)
score RCVD_IN_BL_SPAMCOP_NET 3.0
score RCVD_IN_DNSWL_HI -5.0
score RCVD_IN_DNSWL_LOW -0.5
score RCVD_IN_DNSWL_MED -2.5
score RCVD_IN_DNSWL_NONE 0.5
score RCVD_IN_GBUDB 4.0
score RCVD_IN_IADB_DK -0.5
score RCVD_IN_IADB_DOPTIN_GT50 -0.5
score RCVD_IN_IADB_DOPTIN_LT50 -0.5
score RCVD_IN_IADB_EDDB -0.5
score RCVD_IN_IADB_EPIA -0.5
score RCVD_IN_IADB_GOODMAIL -0.5
score RCVD_IN_IADB_LISTED -0.5
score RCVD_IN_IADB_LOOSE -0.5
score RCVD_IN_IADB_MI_CPEAR 0
score RCVD_IN_IADB_MI_CPR_30 0
score RCVD_IN_IADB_MI_CPR_MAT 0.0
score RCVD_IN_IADB_NOCONTROL -0.5
score RCVD_IN_IADB_OOO -0.5
score RCVD_IN_IADB_OPTIN -0.5
score RCVD_IN_IADB_OPTIN_GT50 -0.5
score RCVD_IN_IADB_OPTIN_LT50 -0.5
score RCVD_IN_IADB_OPTOUTONLY -0.5
score RCVD_IN_IADB_RDNS -0.5
score RCVD_IN_IADB_SENDERID -0.5
score RCVD_IN_IADB_SPF -0.5
score RCVD_IN_IADB_UNVERIFIED_1 -0.5
score RCVD_IN_IADB_UNVERIFIED_2 -0.5
score RCVD_IN_IADB_UT_CPEAR 0
score RCVD_IN_IADB_UT_CPR_30 0
score RCVD_IN_IADB_UT_CPR_MAT 0
score RCVD_IN_JMF_BL 3.5
score RCVD_IN_MSPIKE_BL 0.0
score RCVD_IN_MSPIKE_H2 0.0
score RCVD_IN_MSPIKE_H3 -0.5
score RCVD_IN_MSPIKE_H4 -2.0
score RCVD_IN_MSPIKE_H5 -3.0
score RCVD_IN_MSPIKE_L2 1.5
score RCVD_IN_MSPIKE_L3 3.5
score RCVD_IN_MSPIKE_L4 4.5
score RCVD_IN_MSPIKE_L5 5.0
score RCVD_IN_MSPIKE_WL 0.0
score RCVD_IN_MSPIKE_ZBI 4.0
score RCVD_IN_PBL 3.5
score RCVD_IN_SBL 3.5
score RCVD_IN_SBL_CSS 3.5
score RCVD_IN_SEM_BLACK 3.5
score RCVD_IN_SEM_NET_BLACK 2.5
score RCVD_IN_SORBS_BLOCK 2.5
score RCVD_IN_SORBS_DUL 2.5
score RCVD_IN_SORBS_HTTP 2.5
score RCVD_IN_SORBS_MISC 2.5
score RCVD_IN_SORBS_SMTP 2.5
score RCVD_IN_SORBS_SOCKS 2.5
score RCVD_IN_SORBS_SPAM 2.5
score RCVD_IN_SORBS_WEB 2.5
score RCVD_IN_SORBS_ZOMBIE 2.5
score RCVD_IN_SPAMRATS 3.5
score RCVD_IN_XBL 3.5
score RCVD_IN_ZEN_BLOCKED 0.0
score RCVD_IN_ZEN_BLOCKED_OPENDNS 0.0

# scoring URIBLs
score URIBL_ABUSE_SURBL 4.0
score URIBL_BLACK 4.0
score URIBL_CR_SURBL 4.0
score URIBL_CSS 2.0
score URIBL_CSS_A 2.0
score URIBL_DBL_ABUSE_BOTCC 3.5
score URIBL_DBL_ABUSE_MALW  3.5
score URIBL_DBL_ABUSE_PHISH 3.5
score URIBL_DBL_ABUSE_REDIR 3.5
score URIBL_DBL_ABUSE_SPAM 3.5
score URIBL_DBL_BLOCKED 0.0
score URIBL_DBL_BLOCKED_OPENDNS 0.0
score URIBL_DBL_BOTNETCC 3.5
score URIBL_DBL_ERROR 3.5
score URIBL_DBL_MALWARE 3.5
score URIBL_DBL_PHISH 3.5
score URIBL_DBL_SPAM 3.5
score URIBL_GREY 2.0
score URIBL_MW_SURBL 4.0
score URIBL_PH_SURBL 4.0
score URIBL_RED 2.5
score URIBL_RHS_DOB 2.0
score URIBL_SBL 2.0
score URIBL_SBL_A 2.0
score URIBL_SEM 2.5
score URIBL_SEM_FRESH30 2.0
score URIBL_WS_SURBL 3.0
score URIBL_ZEN_BLOCKED 0.0
score URIBL_ZEN_BLOCKED_OPENDNS 0.0

# scoring DKIM & SPF
score DKIM_INVALID 1.5
score DKIM_SIGNED 0.0
score DKIM_VALID 0.0
score DKIM_VALID_AU 0.0
score DKIM_VALID_EF 0.0
score DKIM_VERIFIED 0.0
score DKIMWL_BL 3.0
score DKIMWL_WL_HIGH -3.5
score DKIMWL_WL_MED -1.5
score DKIMWL_WL_MEDHI -2.5
score FORGED_SPF_HELO 3.0
score SPF_FAIL 1.5
score SPF_HELO_FAIL 1.5
score SPF_HELO_NEUTRAL 1.0
score SPF_HELO_NONE 0.5
score SPF_HELO_PASS 0.0
score SPF_HELO_SOFTFAIL 1.5
score SPF_NEUTRAL 0.5
score SPF_NONE 0.5
score SPF_PASS 0.0
score SPF_SOFTFAIL 1.5

# scoring BAYES
score BAYES_00 -2.5
score BAYES_05  -1.0
score BAYES_20  0.5
score BAYES_40  1.5
score BAYES_50  2.0
score BAYES_60  3.0
score BAYES_80  4.0
score BAYES_95  4.5
score BAYES_99  5.0
score BAYES_999 1.5

# scoring HTML
score HTML_FONT_LOW_CONTRAST 0.5
score HTML_IMAGE_ONLY_04 1.5
score HTML_IMAGE_ONLY_08 2.0
score HTML_IMAGE_ONLY_12 2.0
score HTML_IMAGE_ONLY_16 2.0
score HTML_IMAGE_ONLY_20 2.0
score HTML_IMAGE_ONLY_24 2.5
score HTML_IMAGE_ONLY_28 2.5
score HTML_IMAGE_ONLY_32 3.0
score HTML_IMAGE_RATIO_02 0.0
score HTML_IMAGE_RATIO_04 0.0
score HTML_IMAGE_RATIO_06 0.0
score HTML_IMAGE_RATIO_08 0.0
score HTML_MESSAGE 0.0

# scoring HEADER & MISSING
score HEADER_FROM_DIFFERENT_DOMAINS 1.0
score HEADER_SPAM 2.5
score MISSING_DATE 3.0
score MISSING_FROM 1.5
score MISSING_HB_SEP 0.0
score MISSING_HEADERS 1.5
score MISSING_MID 1.0
score MISSING_MIMEOLE 2.0
score MISSING_SUBJECT 2.0

# scoring FREEMAIL
score FORGED_GMAIL_RCVD 2.5
score FORGED_YAHOO_RCVD 2.5
score FREEMAIL_ENVFROM_END_DIGIT 0.5
score FREEMAIL_FORGED_REPLYTO 2.5
score FREEMAIL_FROM 0
score FREEMAIL_REPLY 1.0
score FREEMAIL_REPLYTO 1.0
score FREEMAIL_REPLYTO_END_DIGIT 0.5
score MALFORMED_FREEMAIL 4.0

# additional scoring tweaks
score BILLION_DOLLARS 2.0
score EMPTY_MESSAGE 1.5
score HELO_DYNAMIC_SPLIT_IP 2.0
score HK_RANDOM_ENVFROM 1.0
score HK_RANDOM_FROM 1.0
score LOTS_OF_MONEY 1.0
score MPART_ALT_DIFF 2.5
score MPART_ALT_DIFF_COUNT 2.5
score NO_DNS_FOR_FROM 0.5
score RDNS_NONE 1.0
score REPLYTO_WITHOUT_TO_CC 2.5
score UNPARSEABLE_RELAY 0.5
score URI_DQ_UNSUB 2.0

(if you’re here while running your own mail server, feel free to use this as a starting point for your own server)

As always, if you have any questions or concerns, please feel free to reach out directly.

The improvements should have a better chance of stopping and properly filtering out spam right from the moment it’s received by my mail server so that they get the big red X.

Like what you see in this post

Want more information on this topic?

Want to learn more about what I can do for your business?

I’d be happy to hear from you.

Improving Spam Identification & Filtering

Table of Contents

What can you do to improve spam filtering for your unique mailbox?

Server-side Spam Filtering Improvements