RegEx Not Matching - Why???

I have a variety of pattern checks in Postfix as header_checks and body_checks in regex format. As spam slips through, I add key phrases to one or the other of these files so that further similar spam won't get through. I try to pick out strings or phrases that won't be used in ordinary communications but are part of spam messages. Since I haven't been doing this long, I forward the message to myself to test the filter.


I received the following e-mail regarding a scam over the weekend:

Code:
Dear Valued Associate,
                                 
                                   REQUEST FOR URGENT BUSINESS RELATIONSHIP 
Best compliments of the day. I have a client who wants to invest in your country
with the sum of US$35.5M and he needs protection for his family.If you can handle
such a huge sum of investment please get back to me as soon as possible to discuss,
for example the investment plan and agreement procedure.
 
Kindly reply if you have any business profile to enable us advise our client
accordingly,and do not forget to include your telephone contacts for easy
communication. Feel free to contact us via email: oberwest@live.co.uk.

Thank you.


OBERHOLSTER WHYTE


I used the following two regex patterns to detect this type of spam:

Code:
/^ *REQUEST FOR URGENT BUSINESS RELATIONSHIP/  REJECT This is an old scam.  You might want to try something new.  BC07
/I have a client who wants to invest in your country with the sum of US/ REJECT Leave us alone.  BC08

The first one works perfectly - Starts at the beginning of the line, takes as many spaces as there are, then matches the urgent business crap. I've not been able to get the second one to hit no matter what I do. I've also tried:

Code:
/^.*I have a client who wants to invest in your country with the sum of US/ REJECT Leave us alone.  BC08

and

Code:
/*I have a client who wants to invest in your country with the sum of US/ REJECT Leave us alone.  BC08

Neither blocks a message with the given phrase in it, whether I send it in hypertext or plain text. I checked with an online flash-based regex verification tool and it indicated that the first version above should work. I've also used it in vi - hit right away. By this, I have to assume that there's a difference between the postfix regex matching and these other two methods.

Anybody see where I'm going wrong or know of a document that lists the quirks of the postfix body_check regex matching?
 
Instead of anchoring everything forcibly to the beginning of a line, overruling that by then using a wildcard ... just use the string you want to match:

Code:
/REQUEST FOR URGENT BUSINESS RELATIONSHIP/
/I have a client who wants to invest in your country with the sum of US/
 
DutchDaemon said:
Instead of anchoring everything forcibly to the beginning of a line, overruling that by then using a wildcard ... just use the string you want to match:

Code:
/REQUEST FOR URGENT BUSINESS RELATIONSHIP/
/I have a client who wants to invest in your country with the sum of US/

That's exactly why I'm so perplexed by this - I have the second line exactly in my body_checks file as it is in your example, with 'REJECT Leave us alone' after the second slash to direct postfix to drop the message. I can send mail to myself with this string in without any problems though. I tried anchoring it as an alternative.


Alt said:
Code:
/I have a client who wants to invest in your country[\s\t\r\n]*with the sum of US/

The line breaks are just there because I forwarded the message to a webmail account - it was all sent on the same line initially. This is what I've been doing with my tests.
 
You were correct on the source of the problem Alt - there was a CR-LF embedded in the message, therefore the regex didn't hit. I captured the decrypted conversation with tcpdump on lo0; I use TLS, so capturing on the outside interface is out. (Not that I didn't try and was puzzled momentarily until understanding smacked me in the face with a shovel. ;) ) I was able to see the CR/LF in the hex dump, then traced the problem back to thunderbird splitting the long lines with CR/LFs. I would guess that the expression I originally wrote would have worked to block real spam messages, as there probably weren't embedded CR/LFs in it.

I reworked the expression to look for a shorter phrase and it now works fine with my test setup.
 
I have another question related to what I discovered while working on this. When the original spam message arrived, Thunderbird displayed it wrapped between 'if you' and 'can handle'. I assumed that this meant the message was sent without line breaks (CR-LFs) and hence, Thunderbird wrapped it. Looking at the message source confirmed this - each paragraph is one line. When I sent a message to test my regex, Thunderbird would send a line break after a set number of characters - each line was explicitly wrapped.

Is one more standard than the other? Or, more direct to the point, should I be concerned about incorporating such line breaks into my regex filters? Or should I just ascribe it to a side-effect of my not being a spammer and trying to emulate one, just pasting key phrases from spam into a new regex when spam makes it through my blocks and have faith that it'll work without explicit testing?
 
Back
Top