Significant network latency when using ipfw and in-kernel NAT

Hi there,

We're running FreeBSD 9.0-RELEASE on a box whose primary purpose is to act as a firewall and a gateway. Up until today, we've been using ipfw(4) in conjunction with natd(8) and the divert action in ipfw(4) to forward packets between the FreeBSD box (i.e. the public Internet) and our private servers.

Unfortunately, natd(8) appears to be quite the CPU hog and we therefore decided to switch to the in-kernel NAT support in ipfw(4). The issue we're running in to is that the network latency appears to be skyrocketing when ipfw(4) contains nat rules. Basically all TCP traffic
originating from the box times out and pinging google.com on the box gives an average of ~10 SECONDS -- and that's even if I explicitly allow all ICMP traffic before the packets even get to the nat rules in ipfw.

The really odd part, however, is that I can ping the FreeBSD box just fine externally. For instance, pinging the server from my home connection gives an average of 45 ms. I'm also able to communicate just fine with the internal servers through the FreeBSD box.

Does anybody have any idea what's going on? I assume I must've misconfigured something big here...
 
I have a working setup with ipfw(8)() and in-kernel NAT on FreeBSD 8.3. I do not see the described latency, and pinging google.com from the NAT machine gives almost the same round-trip time as from a client behind the NAT. Actually, the ping from the client expectedly takes a little bit longer.

You might want to post your NAT and IPFW rules.

I hope that it is not an 9.0 issue, since I am going to upgrade my server soon.
 
Definitely. Since this is a server in production, I've obfuscated some of the IPs, etc.

First off, here's the ifconfig. Our setup consists of a private (ix0) and a public nic (ix1) and an ip tunnel (gif0), which is what we use in ipfw to forward incoming packets to our internal boxes:
Code:
ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=401bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
        ether XX:XX:XX:XX:XX:XX
        inet <private VLAN IP> netmask 0xffffffc0 broadcast xx
        inet6 xxxx::xxx:xxxx:xxxx:xxxx%ix0 prefixlen 64 scopeid 0x7
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
        status: active
ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=400bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO>
        ether XX:XX:XX:XX:XX:XX
        inet <public IP> netmask 0xfffffff8 broadcast xx
        inet6 xxxx::xxx:xxxx:xxxx:xxxx%ix1 prefixlen 64 scopeid 0x8
        inet <alias public IP> netmask 0xffffffff broadcast xx
        inet <alias public IP> netmask 0xffffffff broadcast xx
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-Twinax <full-duplex>)
        status: active
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0xa
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
gif0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> metric 0 mtu 1500
        tunnel inet <private VLAN IP> --> <private VLAN IP>
        inet 172.16.1.1 --> 172.16.1.2 netmask 0xffff0000
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        options=1<ACCEPT_REV_ETHIP_VER>
The basic ruleset looks like this. One-pass is off so that packets are reinjected after going through NAT'ing and pipes:
Code:
00001  16653   4417407 allow ip from any to any via ix0
00003  14588   2860344 allow ip from any to any via gif1
00006      0         0 allow ip from any to any via lo0
00010      0         0 deny ip from 192.168.0.0/16 to any in via ix1
00011      0         0 deny ip from 172.16.0.0/12 to any in via ix1
00012      0         0 deny ip from 10.0.0.0/8 to any in via ix1
00013      0         0 deny ip from 127.0.0.0/8 to any in via ix1
00014      0         0 deny ip from 0.0.0.0/8 to any in via ix1
00015      0         0 deny ip from 169.254.0.0/16 to any in via ix1
00016      0         0 deny ip from 192.0.2.0/24 to any in via ix1
00017      0         0 deny ip from 204.152.64.0/23 to any in via ix1
00018      0         0 deny ip from 224.0.0.0/3 to any in via ix1
00019     15      1020 allow icmp from any to any via ix1   # For testing purposes, allow all ICMP in and out of the public adapter
00020   7537    647951 nat 1 ip from any to any in via ix1   # NAT all incoming traffic
00030      0         0 check-state # For some reason, this never gets matched even though rule #100 is matched
00100    161    124340 skipto 805 tcp from any to any out via ix1 setup keep-state   # For testing purposes, allow all TCP originating from the box out of the public adapter
00110      0         0 skipto 805 icmp from any to any out via ix1 keep-state
00200  36557   1996626 skipto 500 tcp from any to 172.16.1.2 dst-port 443 in via ix1   # Forward NAT'ed traffic for port 443 over the ip tunnel
00201  46593  63973143 skipto 805 tcp from 172.16.1.2 443 to any out via ix1
00400      8      6192 deny ip from any to any via ix1
00500      0         0 pipe 1 ip from any to any in via ix1   # Packet shaping
00501      0         0 allow ip from any to any in via ix1
00805   8963   3412995 nat 1 ip from any to any out via ix1
00806   8963   3412995 allow ip from any to any
10000      0         0 deny ip from any to any via ix1   # Last ditch catch
65535 864357 867120912 allow ip from any to any

'ipfw nat show config' yields:
Code:
ipfw nat 1 config if ix1 log reset redirect_port tcp 172.16.1.2:443 <public IP>:443

And finally, here are the horrifying ping times (furthermore, all outgoing TCP traffic originating from this box, such as wget or pkg_add, time out. I've managed to get an outgoing telnet working, but it's horrible slow and takes a while to establish):
Code:
PING google.com (74.125.227.14): 56 data bytes
64 bytes from 74.125.227.14: icmp_seq=0 ttl=56 time=2746.953 ms
64 bytes from 74.125.227.14: icmp_seq=1 ttl=56 time=2097.460 ms
64 bytes from 74.125.227.14: icmp_seq=2 ttl=56 time=2186.068 ms
64 bytes from 74.125.227.14: icmp_seq=3 ttl=56 time=4292.776 ms
64 bytes from 74.125.227.14: icmp_seq=4 ttl=56 time=5056.965 ms
64 bytes from 74.125.227.14: icmp_seq=5 ttl=56 time=5323.720 ms
64 bytes from 74.125.227.14: icmp_seq=6 ttl=56 time=5007.974 ms
64 bytes from 74.125.227.14: icmp_seq=7 ttl=56 time=4756.587 ms

It's worth mentioning that when I switch back to using natd and divert in the ruleset (which really only changes the nat portions and everything else stays the same), the ping time drops to ~300ms, which is a big difference for simply "using" natd even when the ICMP packets aren't supposed to be going through NAT'ing whatsoever. The ~300ms ping time is still way too high, though, since our other boxes have a ping time to Google of ~0.300ms...

Any ideas?

/ Soren
 
It looks to me as if TSO is enabled on your adapter ix0.

The manual of ipfw(8)() states in the section BUGS:
...
Due to the architecture of libalias(3), ipfw nat is not compatible with
the TCP segmentation offloading (TSO). Thus, to reliably nat your net-
work traffic, please disable TSO on your NICs using ifconfig(8)().
...

Some observations about the firewall rules:

Your ipfw rule 0003 refers to gif1, however, gif1 is not present in your ifconfig(8)() listing. You mentioned explicitly gif0 in your message, and you might want to check your actual rule set.

Also in my case, the check-state rule does always show package/byte counts of zero. So, I assume this is normal.

In my opinion, your rule 200 should become a lower index and go before rule 100, and I would append setup keep-state to it, so most probably rule 201 could be omitted.

I don't think that rule 100 can be for testing only. In my setup a similar rule lets all the clients connecting to the internet, and with out that, my clients would be effectively offline. In this respect, I am missing a corresponding udp rule. On the other hand, I do have nothing like your icmp rule 110, and pinging from behind the NAT does work anyway.
 
rolfheinrich said:
It looks to me as if TSO is enabled on your adapter ix0.

The manual of ipfw(8)() states in the section BUGS:

Yeah, I already got rid of that, which fixed some intermittent TCP connection issues I had when routing traffic through the FreeBSD box to our internal boxes. It had no effect whatsoever on the ping times or TCP traffic originating from the FreeBSD box itself.

rolfheinrich said:
Your ipfw rule 0003 refers to gif1, however, gif1 is not present in your ifconfig(8)() listing. You mentioned explicitly gif0 in your message, and you might want to check your actual rule set.

That's because I obfuscated the ruleset and removed rules that weren't relevant to illustrate the setup. The tunnels are very much working, so ignore any typos like that. :)

rolfheinrich said:
In my opinion, your rule 200 should become a lower index and go before rule 100, and I would append setup keep-state to it, so most probably rule 201 could be omitted.

I've explicitly avoided using dynamic rules for the allowed incoming traffic because I don't want DDoS attacks (which we've been hit by recently) to take up gobs of system memory.

rolfheinrich said:
I don't think that rule 100 can be for testing only. In my setup a similar rule lets all the clients connecting to the internet, and with out that, my clients would be effectively offline.

Sure, it's "necessary", but I could lock it down even more and only allow certain protocols, like pkg_add, ssh, etc.
 
Perhaps, I cannot be of more help here than posting a working ruleset, which is structurally similar to yours, but of course must be somehow different. By replacing the 1723 redirection by a 443 one, and by removing the L2TP/IPsec stuff, this would almost match your setup.

bridge0 is my internal and ue0 my external interface.

Code:
#!/bin/sh
ipfw -q flush

add="ipfw -q add"

ipfw -q nat 1 config if ue0 reset\
                            redirect_port tcp 192.168.0.1:1723 1723\
                            redirect_port udp 192.168.0.1:1701 1701\
                            redirect_port udp 192.168.0.1:500   500\
                            redirect_port udp 192.168.0.1:4500 4500

# Allow everything within the LAN
$add 10 allow ip from any to any via bridge0
$add 20 allow ip from any to any via lo0
$add 30 allow ip from any to any via re0
$add 40 allow ip from any to any via vr0
$add 50 allow ip from any to any via ng*

# Catch spoofing from outside
$add 90 deny ip from any to any not antispoof in

$add 100 nat 1 ip from any to any via ue0 in
$add 101 check-state

# Rules for allowing dial-in calls to the PPTP and L2TP/IPsec VPN servers
# that are listening on a LAN interface behind the NAT
$add 200 skipto 10000 tcp from any to any 1723 via ue0 in setup keep-state
$add 202 skipto 10000 udp from any to any 1701 via ue0 in keep-state
$add 203 skipto 10000 udp from any to any  500 via ue0 in keep-state
$add 204 skipto 10000 udp from any to any 4500 via ue0 in keep-state

# Rules for outgoing traffic - allow everything that is not explicitely denied, ...
$add 1000 deny ip from not me to any 25,53 via ue0 out
# ... and now allow all other outgoing connections
$add 2000 skipto 10000 tcp from any to any via ue0 out setup keep-state
$add 2010 skipto 10000 udp from any to any via ue0 out keep-state

# Rules for incomming traffic - deny everything that is not explicitely allowed
$add 5000 allow tcp from any to any 5,80,443,8080 via ue0 in setup limit src-addr 10

# Catch tcp/udp packets, but don't touch gre, esp, icmp traffic
$add 9998 deny tcp from any to any via ue0
$add 9999 deny udp from any to any via ue0

$add 10000 nat 1 ip from any to any via ue0 out
$add 65534 allow ip from any to any

One final observation, your rule 400 corresponds more AND less to my rules 9998/9999. Your rule takes LESS space, but is MORE restrictive, and I am not sure whether your rule 110 which takes care only for the out direction would fix this.
 
Back
Top