PF Update from 12.2-RELEASE-p11 to 12.3-RELEASE-p7 broke my PF internet router/firewall

Hi all,

I updated my internet facing router/firewall from 12.2-RELEASE-p11 to the latest 12.3 release: 12.3-RELEASE-p7. I'm using a custom kernel with ALTQ support as it helps with my transfer speeds. More info on this below as I even disabled ALTQ to try to solve this issue.

The issue: after updating gmail and some other web applications are no longer working, and there was no change in my pf.conf. Maybe a websocket or some kind of other connectivity problem.

My setup:
The box is dialing in with pppoe and pf is doing NAT and filtering, network cards are using the em driver. Initially when I started looking into this I fiddled with my PF config and removed ALTQ then removed scrub. What I found out is that after some time it started working... But after a day or two it failed again.
What I tried is I checked the blocked things on the firewall with "tcpdump -i pflog0 action block" and strangely I can't see a thing that could cause this, however when I do "tcpdump -i pflog0" I see multiple entries like this (fqdn-s redacted: my-isps-endpoint.something.domain.com is my tun0 IP addresses reverse):
16:25:18.488782 IP some.local.google.content.host.https > my-isps-endpoint.something.domain.com.54402: Flags [P.], seq 3278272298:3278272371, ack 1417887116, win 282, options [nop,nop,sack 1 {2825:3621}], length 73
I only have block LOG rules in my config, so why the hell do I get these entries which pf entry do they come from? I get them with scrub or without there is no difference. For me it looks like from the ack numbers that maybe PF session management is buggy, it does not look like a newly initiated connection but something coming back from googles content servers.

So today I checked something when I logged on I could not reach gmail so I tried to stop and start pf with "/etc/rc.d/pf reload &" from my ssh session multiple times and after some it started working.
Another really strange thing is that this also happened couple of weeks back and then I suspected a bug or an issue with 12.3 so I rolled back my zfs snapshot and right away it everything started working again... Maybe in 12.3 something related to ephemeral ports or something else has changed and this is causing this issue? Did anyone else experience something similar?

I can post my pf.conf later after removing some sensitive data from it if anyone wants to check it, but because the same config works with 12.2 and I get the same issue with disabled altq and with or without scrub on 12.3 I suspect something is off in PF or FreeBSD. Maybe I can try to update the default kernel with freebsd-update to check that one as well as ALTQ is disabled now so I don't really need a custom kernel for the time being until I find out what's causing this.
 
What I did now is I tried fiddling with MTU settings but had no luck.
Then I switched my system from ppp to mpd5 and the connection seems to be working now, I can't see the google content server coming up in the log as blocked. At least not for now I will keep testing it, fingers crossed...
 
Back
Top