IPF + IPNAT intermittent problems, getting worse as uptime increases

Hello everyone

I migrated recently from 9-Release to 10.0-Release and started having problems with IPNAT. I moved IPF/IPNAT/MPD5 configuration files with the only change of changing tun0 to ng0 in ipnat.rules

I have a PPPoE WAN connection handled by MPD5 (ng0 interface). NAT mapping is as follows:
Code:
map ng0 192.168.100.0/24 -> 0/32 proxy port ftp ftp/tcp
map ng0 192.168.100.0/24 -> 0/32 portmap tcp/udp 30000:50000
map ng0 192.168.100.0/24 -> 0/32
Where 192.168.100.0 is my LAN.

To illustrate the problem I run wget on the same machine (gw):
Code:
[muxx@gw ~]$ wget -O /dev/null --bind-address=192.168.100.128 http://www.ej.ru/index.html
converted 'http://www.ej.ru/index.html' (US-ASCII) -> 'http://www.ej.ru/index.html' (UTF-8)
--2015-03-04 20:58:28--  http://www.ej.ru/index.html
Resolving www.ej.ru (www.ej.ru)... 87.239.187.242
Connecting to www.ej.ru (www.ej.ru)|87.239.187.242|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 548 [text/html]
Saving to: '/dev/null'

/dev/null  100%[=============================================================================>]  548  --.-KB/s  in 0s  

2015-03-04 20:58:29 (36.1 MB/s) - '/dev/null' saved [548/548]

[muxx@gw ~]$ wget -O /dev/null --bind-address=192.168.100.128 http://www.ej.ru/index.html
converted 'http://www.ej.ru/index.html' (US-ASCII) -> 'http://www.ej.ru/index.html' (UTF-8)
--2015-03-04 20:58:30--  http://www.ej.ru/index.html
Resolving www.ej.ru (www.ej.ru)... 87.239.187.242
Connecting to www.ej.ru (www.ej.ru)|87.239.187.242|:80... failed: Network is unreachable.
192.168.100.128 is the address looking into LAN.
If I don't bind 192.168.100.128 explicitly, every invocation of wget works fine. If I bind 192.168.100.128, connection fails intermittently approximately half of the time.

Looking at ipmon -a output I can see that for the failing attempt the "NAT:NEW-MAP" doesn't appear. Only "STATE:NEW" followed by "STATE:EXPIRE" after a while.

Obviously, all machines on the LAN suffer from the same problem.

When I try to use tinyproxy (bound to the ng0 interface for outgoing connections), and use it from the LAN, everything works. So it does look like I have misconfigured IPNAT somehow or there's some kind of incompatibility or bug (!?).

I tried specifying mssclamp option in mapping rules, no difference.

I tried removing all rules from IPF just to make sure and it makes no difference.

I would be extremely grateful for any advice.

/max
 
Does this happen on 10.1-RELEASE?

I would gladly upgrade to 10.1-Release but my motherboard has a buggy ACPI implementation, which causes stock kernel to panic immediately. I first need to find a way to patch the 10.1-R kernel to be able to boot. 8(

thanks for your help.
 
I tried userland PPP instead of MPD5 with exactly the same result. :(

I don't think I can blame the LAN NIC, as my wget test runs on the same machine and just binds the LAN interface.
 
If you use tcpdump(1) on the external interface, do you see packets going out it using internal LAN addresses? This looks vaguely familiar to an issue I've seen before that affected IPSEC traffic in 10.0-RELEASE. I'm not sure if what PPP/MPD5 uses is close enough to be affected by the same issue.
 
I rebooted the machine and noticed a significant improvement. Only up to 10% of NAT connections fail to establish, compared to close to 50% last night and before reboot. I made no changes to configuration.

I also noticed that once the connection is established, it doesn't drop. So the problem seems to be with establishing the connection.

I wrote a script that repeatedly invokes wget and checks return status. As before, if I don't bind the NATed address, connections work every time.
I logged the output of ipmon -a for 200 invocations and I can see that for those connections that work, there is a full set of states (STATE:NEW, NAT:NEW-MAP, STATE:CLOSE, NAT:EXPIRE-MAP). For those connections that failed to establish, there is no NAT:NEW-MAP or STATE:CLOSE.

I wonder what the general opinion is on IPF/IPNAT. Is this a recommended solution or should people be migrating to IPFW or PPP NAT? I would expect that if the solution is documented in the Handbook, it should be stable? I have been using IPF+IPNAT for the last 10 years at least.
 
After 24h of uptime the occurence rose from 10% to 20%. At 30h uptime it's up to 40%.

I forgot to mention that while migrating from 9.0-RELEASE I also switched to the new HW. 9.0-RELEASE ran on a single-core Pentium 4, 10.0-RELEASE is now running on the new box:

Code:
[muxx@gw ~]$ sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: amd64
hw.model: Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz
hw.ncpu: 4
hw.machine_arch: amd64
 
You're not the first person to have issues with IPFilter lately. See Thread 50432. I'll point out the same thing. IPFilter in 10.x is at version 5 and IPFilter in 9.x was version 4. I don't know how complex your configuration is but consider either opening a PR and assisting with troubleshooting or switching to an alternative like pf(4).
 
I wonder what the general opinion is on IPF/IPNAT. Is this a recommended solution or should people be migrating to IPFW or PPP NAT? I would expect that if the solution is documented in the Handbook, it should be stable? I have been using IPF+IPNAT for the last 10 years at least.

PF should be able to do everything IPFilter does and is better supported. IPFW will certainly work as well but it takes much more work to convert over because of very different rule formalism and syntax.
 
I upgraded to 10.1-RELEASE. The problem remains.

My configuration on the gateway is simple:
Modem <-> re0 <-> PPP (either MPD5 or userland PPP) ng0/tun0 <-> IPF <-> NAT <-> LAN (2 x bce) and jails

The FreeBSD gateway sits between the fibre modem (ethernet) and two LANs.

Even when all rules are removed from IPF, the problem with NAT remains. IPF is passing everything by default AFAIK.

These are all map rules from my ipnat.rules:
Code:
# nat for LANs

map ng0 192.168.100.0/24 -> 0/32 proxy port ftp ftp/tcp
map ng0 192.168.100.0/24 -> 0/32 portmap tcp/udp 30000:50000
map ng0 192.168.100.0/24 -> 0/32

map ng0 192.168.101.0/24 -> 0/32 proxy port ftp ftp/tcp
map ng0 192.168.101.0/24 -> 0/32 portmap tcp/udp 50001:51000
map ng0 192.168.101.0/24 -> 0/32

map ng0 192.168.200.0/24 -> 0/32 proxy port ftp ftp/tcp
map ng0 192.168.200.0/24 -> 0/32 portmap tcp/udp 51001:52000
map ng0 192.168.200.0/24 -> 0/32

# nat for jabber jails

map ng0 192.168.64.0/24 -> 0/32 proxy port ftp ftp/tcp
map ng0 192.168.64.0/24 -> 0/32 portmap tcp/udp 52001:55000
map ng0 192.168.64.0/24 -> 0/32
I also tried auto ports.

I have rdr rules in my ipnat.rules to forward ports to jails and LAN machines. Removing those and leaving only the above map rules doesn't improve the situation.

I also tried leaving just one NAT mapping for one LAN -- same failure.

httping from one of the LAN machines:
Code:
muxx@vaio:~$ httping -c 20 -f www.ej.ru
PING www.ej.ru:80 (www.ej.ru):
connected to 87.239.187.242:80 (474 bytes), seq=0 time=98.96 ms
could not connect (No route to host)
could not connect (No route to host)
connected to 87.239.187.242:80 (474 bytes), seq=3 time=103.58 ms
connected to 87.239.187.242:80 (474 bytes), seq=4 time=101.65 ms
could not connect (No route to host)
could not connect (No route to host)
connected to 87.239.187.242:80 (474 bytes), seq=7 time=111.27 ms
connected to 87.239.187.242:80 (474 bytes), seq=8 time=100.53 ms
connected to 87.239.187.242:80 (474 bytes), seq=9 time=99.28 ms
connected to 87.239.187.242:80 (474 bytes), seq=10 time=93.61 ms
connected to 87.239.187.242:80 (474 bytes), seq=11 time=99.35 ms
could not connect (No route to host)
connected to 87.239.187.242:80 (474 bytes), seq=13 time=95.27 ms
connected to 87.239.187.242:80 (474 bytes), seq=14 time=92.81 ms
connected to 87.239.187.242:80 (474 bytes), seq=15 time=97.47 ms
connected to 87.239.187.242:80 (474 bytes), seq=16 time=93.58 ms
connected to 87.239.187.242:80 (474 bytes), seq=17 time=96.61 ms
connected to 87.239.187.242:80 (474 bytes), seq=18 time=96.29 ms
could not connect (No route to host)
--- www.ej.ru ping statistics ---
20 connects, 14 ok, 30.00% failed, time 1399ms
round-trip min/avg/max = 92.8/98.6/111.3 ms
Binding the LAN address on the gateway results in the same failure. httpinging directly (binds ng0/tun0) works perfectly.

I am running a GENERIC kernel, no fancy options in rc.conf

I am really at a loss here. I am not doing anything advanced or complicated. As I said the above configuration was working perfectly on a single-core machine with 9.0-RELEASE.
 
I believe I am having the same problem with IPFilter. A fresh reboot seems top fix it for a little while.
I am at 10.1-RELEASE version already.

In the dmesg log it has.
IP Filter: v5.1.2 initialized. Default = pass all, Logging = enabled

Can someone tell me how to switch to pf(4) or to fix IPFilter?
I have looked but can not find a 'How To' on this.

Thanks
 
Last edited by a moderator:
I believe I am having the same problem with IPFilter. A fresh reboot seems top fix it for a little while.
I am at 10.1-RELEASE version already.

In the dmesg log it has.
IP Filter: v5.1.2 initialized. Default = pass all, Logging = enabled

Can someone tell me how to switch to pf(4) or to fix IPFilter?
I have looked but can not find a 'How To' on this.

Thanks

As far as a "how to", it's just a matter of converting what the intent of your rules are into pf.conf(5) style. The Handbook also has a bunch of good information to start with.
https://www.FreeBSD.org/doc/en_US.ISO8859-1/books/handbook/firewalls-pf.html

If anything, reporting as much detail as you can at https://bugs.FreeBSD.org/bugzilla/ would be helpful. IPFilter doesn't seem to have the same mind share as PF or IPFW so those that do use it reporting what they see all make a difference in making it better.
 
I'd be interested in seeing your logs, kernel config file, sysctl -a output, ipf -T list output, ifconfig -a output, rules and NAT rules. Can you send them to me please? I'd also like to hear about your network topology, MTU, etc., and if you have any other firewalls in your network, e.g. I use two firewalls, one being FreeBSD IPF, back to back. (You can reply off-list if you want.)
 
I finally downgraded to 9.3 and it works fine.
I do not have the logs you are asking for any more, sorry.
If you would have gotten to me a lot sooner I would have had that stuff.
 
I finally downgraded to 9.3 and it works fine.
I do not have the logs you are asking for any more, sorry.
If you would have gotten to me a lot sooner I would have had that stuff.

I'd like to report that after a freebsd-update on 2/4/16 from 9.3 to 10.2 we experienced the same issue. A downgrade seems to have resolved things. Here's a downgrading guide if anyone needs it: https://lifeforms.nl/20141224/downgrading-with-freebsd-update
 
Can you please provide logs? Do you see bad packets in the logs? Bad packets are indicative of bad checksums. Is fast forwarding enabled? ipfilter prior to r292979 (December 30, 2015 in stable/10) had fastforward checksum issues. 10.2-RELEASE did not have the fix. 10.3-RELEASE will.

Other things to check: Do you by chance have tso4 enabled? If you do, can you try ifconfig fxp0 -tso4. Replace fxp0 with your interface. What type of interface are you using, fxp, bge, etc?
 
Can you please provide logs? Do you see bad packets in the logs? Bad packets are indicative of bad checksums. Is fast forwarding enabled? ipfilter prior to r292979 (December 30, 2015 in stable/10) had fastforward checksum issues. 10.2-RELEASE did not have the fix. 10.3-RELEASE will.

Other things to check: Do you by chance have tso4 enabled? If you do, can you try ifconfig fxp0 -tso4. Replace fxp0 with your interface. What type of interface are you using, fxp, bge, etc?

So, does ipnat work in 10.3 now? I've been keeping my firewall on 9.3 all this time, hoping for a fix. Is 10.3 problem-free with ipnat?
 
So, does ipnat work in 10.3 now? I've been keeping my firewall on 9.3 all this time, hoping for a fix. Is 10.3 problem-free with ipnat?
I've been using ipf 5.1.2 in 10.X and 11 since the beginning, with ipnat.

To those who have been having problems, I need logs, rulesets, and general config info, e.g. ifconfig, etc.

Also what kind of clients system have been having problems? When I used ipfilter (2 or was it 3, I can't recall now) my FreeBSD and Linux systems had no problems whereas Windows at the time did. Another excellent piece of information is, what kind of clients are people using? On which side of your firewall is the connection initiated? Can you try the same operation from a FreeBSD or Linux client? Does it work better? This will help diagnose the problem.

Simply saying it don't work doesn't help solve the problem. As maintainer of ipfilter in FreeBSD, if I cannot reproduce the problem, then there is no problem. I need information. Please give me some information.
 
I've been using ipf 5.1.2 in 10.X and 11 since the beginning, with ipnat.

To those who have been having problems, I need logs, rulesets, and general config info, e.g. ifconfig, etc.

Also what kind of clients system have been having problems? When I used ipfilter (2 or was it 3, I can't recall now) my FreeBSD and Linux systems had no problems whereas Windows at the time did. Another excellent piece of information is, what kind of clients are people using? On which side of your firewall is the connection initiated? Can you try the same operation from a FreeBSD or Linux client? Does it work better? This will help diagnose the problem.

Simply saying it don't work doesn't help solve the problem. As maintainer of ipfilter in FreeBSD, if I cannot reproduce the problem, then there is no problem. I need information. Please give me some information.

Thank you for the response. And thank you for maintaining ipfilter!

I'm sorry, though, I have no information to provide, since I'm on 9.3. But in your post that I quoted you said, "ipfilter prior to r292979 (December 30, 2015 in stable/10) had fastforward checksum issues. 10.2-RELEASE did not have the fix. 10.3-RELEASE will." So it seems there were some issues that were identified and that a fix was planned for 10.3. I was wondering if that fix had actually made it into 10.3, and if there had been reports of any issues subsequently?
 
Thank you for the response.

I'm sorry, though, I have no information to provide, since I'm on 9.3. But in your post that I quoted you said, "ipfilter prior to r292979 (December 30, 2015 in stable/10) had fastforward checksum issues. 10.2-RELEASE did not have the fix. 10.3-RELEASE will." So it seems there were some issues that were identified and that a fix was planned for 10.3. I was wondering if that fix had actually made it into 10.3, and if there had been reports of any issues reported subsequently?

10.3 does have the fastforward patch. No issues on my prod firewall (10.3) nor on my test firewall (11-CURRENT using tryforward). The client systems used behind the firewall are FreeBSD (9, 10, and 11), Andriod, with the occasional Windows client. This is not a new FreeBSD ipfilter firewall. It's been in existence for at least 15 years, maybe longer.

The other thing I can suggest is that folks try -CURRENT, then enable the ipfilter DTrace probes to report where bad (checksum) packets are flagged:

dtrace -n 'sdt:::ipf_fi_bad_* { stack(); }'

This would at least point us in the right direction. I'd be grateful if anyone would be willing to capture this.
 
Hopefully someone will be able to provide the details you need. Otherwise, in the coming weeks I plan to set up a test server and replicate my 9.3 firewall configuration on it, then attempt an upgrade to 10.3. I'll then try it out in service for a while and see what happens. If it works without issues, I'll report that. If it doesn't, I'll try to provide the information you need to diagnose any issues.
 
I set up my 10.3 test server. I know you said to go to 11-CURRENT, but I was hoping that 10.3 would work. I tried to enable the ipfilter DTrace probes, but the response I got was:

dtrace: invalid probe specifier sdt:::ipf_fi_bad_* { stack(); }: probe description sdt:::ipf_fi_bad_* does not match any probes

Does this not work on 10.3?
 
This problem persists with the 11.0-RELEASE. I was hoping for a fix because in my case, I also use security/py-fail2ban. Lots of configuration changes would be needed to switch to one of the other 2 firewall options. Maybe a note or warning should be put somewhere to warn that ipfilter() is broken. I've been using ipfilter() for quite a while and would like to keep it that way, so I guess I am stuck with 9.3-RELEASE for now. I do have 5 gateways running and am willing to help the developer(s) solve this problem. I don't code anymore but I think it shouldn't be that difficult to stomp this bug and get ipfilter() working again.
 
Back
Top