Postfix - Regular Expression

I update a file with regexes for postfix to do body_checks on whenever a piece of spam makes it through my spam blocking techniques. This way, similar messages will not get through in the future. I've had some difficulty, mainly with the message being received different than it's displayed in a mail client. Therefore I use [cmd=]tcpdump -s0 -i lo0 -w blah.tcp[/cmd] to capture the information as I forward the spam to another account on the machine (lo0 to circumvent the TLS encryption present on the external interface), and then examine the capture to see multiple embedded spaces (displayed as one by Thunderbird), newline characters, etc. I've run into one that I can't seem to figure out why the regular expression isn't hitting on though.

The original message:
Code:
Dear Sir/madam,

My name is Sergeant Ryan Green, I am an American soldier, I am  serving in the
military of the 1st Armoured Division in Iraq,  as you know we are being attacked
by insurgents everyday and  car bombs. We managed to move funds belonging
to Saddam Hussein?s family.

The total amount is US$25 Million dollars in cash, mostly 100 dollar bills. We
want to move this money to you, so that you may invest it for us and keep our
share for banking. You can go to this web link to read about events that took
place there:  http://news.bbc.co.uk/2/hi/middle_east/2988455.stm We will take
70%, my partner and I. You take the other 30%.  No strings attached, just help us
move it out of Iraq, Iraq is a war zone.

We plan on using diplomatic courier and shipping the money out in one large
silver box, using diplomatic immunity. If you are interested I will send you the
full details, my job is to find a good partner that we can trust and that will assist
us.

Can I trust you? When you receive this mail, kindly send me an e-mail
signifying your interest.

This business is risk free. The box can be shipped out in 48hrs if you will be
ready to assist us.

Yours faithfully,
Sergeant Ryan Green.

This is completely despicable IMO, as Sgt. Ryan Green was killed in Iraq. :( x( :( I decided to key on the first line of the message, which contains both improper use of commas and an extra space between 'am' and 'serving'.

I do a packet capture:
Code:
0830  0d 0a 0d 0a 44 65 61 72 20 53 69 72 2f 6d 61 64   ....Dear Sir/mad
0840  61 6d 2c 0d 0a 0d 0a 4d 79 20 6e 61 6d 65 20 69   am,....My name i
0850  73 20 53 65 72 67 65 61 6e 74 20 52 79 61 6e 20   s Sergeant Ryan 
0860  47 72 65 65 6e 2c 20 49 20 61 6d 20 61 6e 20 41   Green, I am an A
0870  6d 65 72 69 63 61 6e 20 73 6f 6c 64 69 65 72 2c   merican soldier,
0880  20 49 20 61 6d 20 20 73 65 72 76 69 6e 67 0d 0a    I am  serving..
0890  69 6e 20 74 68 65 0d 0a 6d 69 6c 69 74 61 72 79   in the..military
08a0  20 6f 66 20 74 68 65 20 31 73 74 20 41 72 6d 6f    of the 1st Armo
08b0  75 72 65 64 20 44 69 76 69 73 69 6f 6e 20 69 6e   ured Division in
08c0  20 49 72 61 71 2c 20 20 61 73 20 79 6f 75 20 6b    Iraq,  as you k
08d0  6e 6f 77 20 77 65 20 61 72 65 20 62 65 69 6e 67   now we are being
08e0  0d 0a 61 74 74 61 63 6b 65 64 0d 0a 62 79 20 69   ..attacked..by i
08f0  6e 73 75 72 67 65 6e 74 73 20 65 76 65 72 79 64   nsurgents everyd
0900  61 79 20 61 6e 64 20 20 63 61 72 20 62 6f 6d 62   ay and  car bomb
0910  73 2e 20 57 65 20 6d 61 6e 61 67 65 64 20 74 6f   s. We managed to
0920  20 6d 6f 76 65 20 66 75 6e 64 73 20 62 65 6c 6f    move funds belo
0930  6e 67 69 6e 67 0d 0a 74 6f 20 53 61 64 64 61 6d   nging..to Saddam
0940  20 48 75 73 73 65 69 6e 3f 73 20 66 61 6d 69 6c    Hussein?s famil
0950  79 2e 0d 0a 0d 0a 54 68 65 20 74 6f 74 61 6c 20   y.....The total 
0960  61 6d 6f 75 6e 74 20 69 73 20 55 53 24 32 35 20   amount is US$25

And the regular expression I created (with added CRs to prevent it from horizontally scrolling the page):
Code:
/^My name is Sergeant Ryan Green\, I am an American soldier\, I am  serving$/ REJECT
Trying to scam somebody by impersonating a deceased soldier is without class.  BC14


This should hit as far as I can tell, but I can forward the same message to another account on my mail server and it goes right through. Does anybody see where I've gone wrong?
 
http://www.postfix.org/BUILTIN_FILTER_README.html
Code:
Despite warnings, some people try to use the built-in filter feature for general junk email and/or 
virus blocking, using hundreds or even thousands of regular expressions. This can result in catastrophic 
performance failure. The symptoms are as follows: ...

Try amavisd-new, spamassassin and bayes filtering instead.
 
hydra said:
http://www.postfix.org/BUILTIN_FILTER_README.html
Code:
Despite warnings, some people try to use the built-in filter feature for general junk email and/or 
virus blocking, using hundreds or even thousands of regular expressions. This can result in catastrophic 
performance failure. The symptoms are as follows: ...

Try amavisd-new, spamassassin and bayes filtering instead.

I've read the document you reference hydra, but I have exactly 14 regular expressions in said file, including the one I'm trying to get working. If it gets to the point where I have 'hundreds or even thousands of regular expressions' or experience 'catastrophic performance failure', I'll look at alternatives at that time. I actually have amavisd-new installed and working to provide virus scanning of messages. My experience with spam filtering is not a good one though, and I'd much rather employ a set of regular expressions to catch the spam that makes it through the other blocking techniques I have in place than risk the pitfalls inherent in using a spam filter. (False positives, overloading the machine, etc.)

At this point, I'd just like to figure out why a regular expression, which to my knowledge should work, is not doing so.
 
Ruler2112 said:
At this point, I'd just like to figure out why a regular expression, which to my knowledge should work, is not doing so.

AAAAAARRRrrrrgggggghhhhhhh!!!!! It was working all along, but I guess Postfix needs to have it's config refreshed with [cmd=]postfix reload[/cmd] to pick up the changes immediately. No matter how many times I tell myself to try simple stuff first... :(
slap.gif
 
If 14, that's OK. Next, you mention the risk with anti-spam solutions (I suppose you mean amavisd-new, SpamAssasin and the rest), however you want to block things directly with Postfix. Consider what can lead to incorrect blocking earlier:

- A way where you block messages based on some regular expression that can maybe sometimes in the future match a regular mail. You are from US, so your regular mail may sooner or later match the spam body text maybe.
- Amavisd-new uses several tests, each is weighted and it even quarantines stuff, so you can allow the users/admins to release the imprisoned messages.

Btw, where are you getting the most spam from (dynamic/static addresses) ?
 
hydra said:
A way where you block messages based on some regular expression that can maybe sometimes in the future match a regular mail. You are from US, so your regular mail may sooner or later match the spam body text maybe.
...
Btw, where are you getting the most spam from (dynamic/static addresses) ?

Your first point is why I key in on mis-spelled words or phrases with non-standard english when creating the regexes. Phrases like 'Attention Internet User' or 'Best Compliments of the Day to You' - nobody really talks like that...

I have no idea how to determine what type of addresses the spam originates from. To be honest, not a whole heck of a lot makes it through the various blocking techniques I have in place; I view the regexes as simply a last-ditch fallback measure to catch what spam does manage to trickle through. Seems like phishing messages are good at sneaking through the blocks, which is perfect because I just create a regex that looks for stuff like the URL of the fake paypal/bank login page, a winning lottery ticket number, or phone numbers to call to claim your prize. (Reporting the spam to either the impersonated institution or the ISP of the person sending it does absolutely nothing.)

I'll probably end up implementing something like you suggest sometime in the future.
 
If the phrases are like that, it's ok, however we started to receive spam that pretty much resembles normal communication except for some web links. For the dynamic addresses, check the mail logs and seek for connections from addresses like: x.x.x.x.dynamic.jazztel.es, x-x-x-x.dynamic.hinet.net, ppp-x-x-x-x.revip2.asianet.co.th and so on... If you receive a lot of mail of such dynamic addresses (we have thousands of connections from dynamic addresses), you can block them with regular expressions (with postfix). Along with greylisting, you can pretty much lower the spams getting through. Good luck man.
 
Ruler2112 said:
And the regular expression I created (with added CRs to prevent it from horizontally scrolling the page):
Code:
/^My name is Sergeant Ryan Green\, I am an American soldier\, I am  serving$/ REJECT
Trying to scam somebody by impersonating a deceased soldier is without class.  BC14


This should hit as far as I can tell, but I can forward the same message to another account on my mail server and it goes right through. Does anybody see where I've gone wrong?
Postfix uses PCRE by default. With this flavor of regex the expression schould read

Code:
^My name is Sergeant Ryan Green\, I am an American soldier\, I am  serving

$ asserts position at end of a line.
 
Erratus said:
Postfix uses PCRE by default. With this flavor of regex the expression schould read

Code:
^My name is Sergeant Ryan Green\, I am an American soldier\, I am  serving

$ asserts position at end of a line.

I intended it to define the entire line between CR/LF characters (in order to be as specific as possible), which is why I included $. IOW, there is a CR/LF before the 'My', hence the beginning of the line, and there's a CR/LF after 'serving', denoting the end of the line. I'm very new at making regular expressions for postfix - is there a reason I should not form the expression this way?
 
Reading the text in your first port

Code:
My name is Sergeant Ryan Green, I am an American soldier, I am  serving in the
where "the" is the end of the line.
And your regex
Code:
/^My name is Sergeant Ryan Green\, I am an American soldier\, I am  serving$/
needs "servings" as the end of the line.

I do not work with Postfix, so I do not know if Postfix needs "/" at the begin and end of an expression. For the regex itself both "/" would make the regex fail in PCRE. Also for PCRE there would be no need to put a "\" before ",".

For a spam filter I would use this without any tokens

Code:
This business is risk free

:)
 
OK, I understand where you're coming from. That's something that caught me more than once in the past - e-mails will arrive with CR/LFs wherever and most mail clients will display them as white space. If you look at the hex dump of the packet capture in my first post, you'll see the 0d-0a combination after 'serving', even though both Thunderbird and Squirrelmail display the message with 'the' being the last word on the line. (Tricky spammers...)
 
Just a small note.

I would pick up some other details from this spam so you have a higher hit rate even with modified spam mails.

For example this is often used in spam

Code:
/illion dollars/     REJECT BR-20100521.001 bm-dollars
/shipping the money/ REJECT BR-20100521.002 money shipp
/Can I trust you/    REJECT BR-20100521.003 don't trust silly spammers

with the first rule you can get a high false-positive rate If you receive financial newsletters.

For statistics it is easier to give a REJECT number before the rest of the reject text.
for example
BR-... body check with reject
HR-... header reject
...
 
Back
Top