I have a mail server that's been up for a couple years running FreeBSD 7.1. I've always used pf as the firewall and have generally found it to be very reliable and easy to use. However, I had a problem yesterday that is twisting my brain into a knot.
About noon EDT, a user came to me and told me that he wasn't getting sales leads from one source. He also discovered that he was unable to send mail to anybody at my domains from his gmail account. I checked and there was nothing in the server logs about any rejected messages, but I did verify by both using his gmail account and my own fastem account to send mail to myself at work - nothing arrived and nothing appeared in the logs. I did discover that I'd get the dreaded
when trying to ping any of the affected machines. (It was weird - I could ping certain IPs and others would produce the above error.) Even openntpd started spewing out similar errors to the console. I assumed that these were related and started troubleshooting.
I tried restarting postfix, amamisd-new, and postgrey. Nothing. (This is when I found I couldn't ping.) I restarted the firewall with [cmd=]pfctl -d;pfctl -e -f /etc/pf.rules[/cmd] - nada. Then I restarted ppp to refresh my PPPoE link (DSL), which also restarts the firewall - still had the same symptoms. Likewise after rebooting the server. I found a reference online that suggested it might be caused by a backbone outage, so I called my ISP. Doug hooked me up with a different user ID/password that put me on a completely different subnet - obviously I couldn't receive mail because my static IP was tied to my regular user ID, but the ping issues remained. After some more troubleshooting, Doug suggested that it could be a firewall causing the problem. I didn't think this was the case, but enabled my old rule-set anyway just to prove him wrong. Imagine my surprise when things started working! I left the old rules (without traffic shaping or block tables) in place. The PPPoE link re-enabled my current rule-set automatically when it re-authenticated earlier today - there have been no problems since. I just tested and I can send mail from my home account no problem.
I've been trying to figure out what happened, why, and a way to fix it since, but it seems to defy logical explanation. I checked the current and old block tables (I keep files from the past week) and haven't found any of the IPs this happened with in there. I thought that one or more of the in-memory block tables may have gotten corrupt, but restarting the firewall, PPP link, or server should have cleared the problem if that were the case. If it were a problem with the current rule-set itself, why did it not behave that way for the ~week I was gone on vacation immediate prior to this and why is it not doing the same thing now???
It also affected some machines and not others. Most of the google addresses I tried resulted in errors, but only about half of the yahoo addresses had trouble - the other half went through fine. I could ping another of my company's public servers over the internet, but I could not ping my ISPs main web server. People were getting mail, but only from certain sources.
I'm copying both my current firewall script and the old one below. What am I missing???
About noon EDT, a user came to me and told me that he wasn't getting sales leads from one source. He also discovered that he was unable to send mail to anybody at my domains from his gmail account. I checked and there was nothing in the server logs about any rejected messages, but I did verify by both using his gmail account and my own fastem account to send mail to myself at work - nothing arrived and nothing appeared in the logs. I did discover that I'd get the dreaded
Code:
sendto: Operation not permitted
I tried restarting postfix, amamisd-new, and postgrey. Nothing. (This is when I found I couldn't ping.) I restarted the firewall with [cmd=]pfctl -d;pfctl -e -f /etc/pf.rules[/cmd] - nada. Then I restarted ppp to refresh my PPPoE link (DSL), which also restarts the firewall - still had the same symptoms. Likewise after rebooting the server. I found a reference online that suggested it might be caused by a backbone outage, so I called my ISP. Doug hooked me up with a different user ID/password that put me on a completely different subnet - obviously I couldn't receive mail because my static IP was tied to my regular user ID, but the ping issues remained. After some more troubleshooting, Doug suggested that it could be a firewall causing the problem. I didn't think this was the case, but enabled my old rule-set anyway just to prove him wrong. Imagine my surprise when things started working! I left the old rules (without traffic shaping or block tables) in place. The PPPoE link re-enabled my current rule-set automatically when it re-authenticated earlier today - there have been no problems since. I just tested and I can send mail from my home account no problem.
I've been trying to figure out what happened, why, and a way to fix it since, but it seems to defy logical explanation. I checked the current and old block tables (I keep files from the past week) and haven't found any of the IPs this happened with in there. I thought that one or more of the in-memory block tables may have gotten corrupt, but restarting the firewall, PPP link, or server should have cleared the problem if that were the case. If it were a problem with the current rule-set itself, why did it not behave that way for the ~week I was gone on vacation immediate prior to this and why is it not doing the same thing now???
It also affected some machines and not others. Most of the google addresses I tried resulted in errors, but only about half of the yahoo addresses had trouble - the other half went through fine. I could ping another of my company's public servers over the internet, but I could not ping my ISPs main web server. People were getting mail, but only from certain sources.
Code:
Some IPs that resulted in 'sendto: operation not permitted':
65.203.23.136
66.111.4.55
74.125.95.104
64.233.169.104
66.249.81.104
67.195.160.76
72.30.2.43
72.14.204.103
72.14.204.147
72.14.204.104
72.14.204.99
I'm copying both my current firewall script and the old one below. What am I missing???