Solved In-kernel NAT dropping large UDP return packets

When a T-Mobile "femto-cell" is trying to establish its IPv4, IPSEC tunnel to the T-Mobile provisioning servers, the 4640-byte return packet is silently dropped by the in-kernel NAT, even though it "matches" the outbound packet from less than 100 ms prior.

All other operations of the firewall seem to be functioning as expected. This includes iPhones using "WiFi Calling" which utilizes similar IPSEC connections to T-Mobile servers (though fragmentation has not been seen on those connections). The connection for the femto-cell can be handled by a Linux/netfilter NAT. Proper reassembly of the packet fragments within the firewall, at the start of the rule set, has been confirmed with ngtee and wireshark

Is there a known issue with large packets and in-kernel NAT?

Edit: Yes, there is a 4k limit for ipfw+nat and ng_nat, at least with 11.1-RELEASE-p9 and -p10

In advance of MFC and RELEASE builds, for patch for STABLE or RELEASE, please see https://svnweb.freebsd.org/base?view=revision&revision=335133 as committed on 2018-06-14 to CURRENT. Big "thanks" to Andrey V. Elsukov for the insight and fix.

The only sysctl that I found that seemed related was the UDP timeout. For good measure I upped it to 30 (seconds), but that did not change the behavior.

Are there known causes and/or resolutions for this behavior?

---

Diagnosis has been a challenge as there does not seem to be a way to enable logging of dropped packets with the in-kernel NAT, nor have I been able to find a way to examine the NAT table (and don't see a call to do so described in libalias(3)).

Logical flow and packet progress through the firewall has been instrumented with ngtee and ng_iface nodes, for both the "in" pass and the "out" pass through the firewall. Pre- and post-NAT packets are captured in the rule immediately prior and immediately after the nat rules.

The logical flow and NAT appear to be operating as expected, with the exception of the drop of the "IKE-AUTH MID=01 Responder Response" packet (as wireshark describes it).

The initial IKE_SA_INIT exchange on UDP 500 proceeds as expected:
  • 532 bytes IKE_SA_INIT MID=00 Initiator Request
    • Received at inside interface ${device_IP}:500 => some_server.t-mobile.com:500
    • NAT outbound to ${outside_IP}:500 => some_server.t-mobile.com:500
    • Sent to router via outside interface
  • 533 bytes IKE_SA_INIT MID=00 Initiator Response
    • Received at outside interface some_server.t-mobile.com:500 => ${outside_IP}:500
    • NAT inbound to some_server.t-mobile.com:500 => ${device_IP}:500
    • Sent to device via inside interface
The outbound IKE_AUTH request on UDP 4500 packet is reassembled, NAT-ed, and delivered, as expected:
  • 2112 bytes IKE_AUTH MID=01 Initiator Request
    • Received as two fragments, 1504 and 632 bytes at inside interface ${device_IP}:4500 => some_server.t-mobile.com:4500
    • Reassembled by ipfw, wireshark indicates properly reassembled on the ngtee debug interface
    • NAT outbound to ${outside_IP}:4500 => some_server.t-mobile.com:4500
    • Sent to router via outside interface
The inbound IKE_AUTH response on UDP 4500 packet is reassembled, but "disappears" in the NAT:
  • 4640 bytes IKE_AUTH MID=01 Responder Response
    • Received as four fragments, 1504, 1504, 1504, and 200 bytes at outside interface some_server.t-mobile.com:4500 => ${outside_IP}:4500
    • Reassembled by ipfw, wireshark indicates properly reassembled on the ngtee debug interface
    • NAT inbound -- packet not seen after NAT
NAT is configured with ipfw nat 1 config ip ${outside_IP} log same_ports unreg_only and some static redirect_port mappings, none of which involve ports 500 or 4500.

11.1-RELEASE-p9 FreeBSD 11.1-RELEASE-p9 #0: Tue Apr 3 16:59:16 UTC 2018 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64

Additional keywords: Jumbo frame, frag, fragmentation
 
In my ipfw firewall setup I got directly after the UDP rule for the IPsec traffic:
ipfw add 5020 allow udp from any to me in recv $WAN frag

Without that, I cannot connect with Windows IKEv2 clients to the VPN. The point is, that fragments don't have port numbers assigned and therefore the firewall cannot allow those even if they're belonging to otherwise allowed traffic, e.g. on UDP 500 and 4500.
 
The packets have already been reassembled successfully prior to NAT. They are being presented to the NAT as a complete, unfragmented packet. As such, all packets have the IP header and all the data, including the UDP ports involved. This has been confirmed through ngtee examination of the packets from various points within the rules, including immediately before and immediately after the rule with the nat action.

(As a side note, opening a firewall to "any" fragment is not something I'd be comfortable with, even if limited to a specific protocol.)
 
I didn't examine in depth the packet flow. I can only tell, that without that rule some IPsec connections do not work. Fragments are by definition of kind "any", they don't come with port numbers and even may come in out of order.

There are other obstacles with NAT'ing though, depending on which kind of firewall you set up. In the stateful case, make sure that you set the sysctl net.inet.ip.fw.one_pass=0 and you would need two NAT rules, one for incomming, and one for outgoing traffic. In the stateless case, make sure that you got a separate rule for each traffic direction.
 
From Andrey V. Elsukov:

The kernel version of libalias uses m_megapullup() function to make
single contiguous buffer. m_megapullup() uses m_get2() function to
allocate mbuf of appropriate size. If size of packet greater than 4k it
will fail. So, if you use MTU greater than 4k or if after fragments
reassembly you get a packet with length greater than 4k, ipfw_nat()
function will drop this packet.


From /usr/src/sys/netinet/libalias/alias.c
Code:
#ifdef _KERNEL
/*
* m_megapullup() - this function is a big hack.
* Thankfully, it's only used in ng_nat and ipfw+nat.

I'll try natd(8) when I'm not relying on network connectivity to access the machine.

Edit, thanks again to Andrey, tested and "works for me" on 11.1-RELEASE-p10 with GENERIC kernconf
(Please check the
"bug" listing for any updates on the patch or its implementation in future versions.)

For patch for STABLE or RELEASE, please see https://svnweb.freebsd.org/base?view=revision&revision=335133 as committed on 2018-06-14 to CURRENT

Edit -- 2018-12-11 -- Just checked 12.0-RELEASE and the changes mentioned above are present in the source.
 
Last edited:
Back
Top