PF VOIP phone IPsec and pf+NAT

Hello forum.

This is a bit of a long shot and I don't have much useful diagnostic information to provide, but I'm otherwise running out of ideas.

At home, I have replaced my ISP-supplied VDSL router with a bridging modem and a FreeBSD host running pf(4). I have a very simple ruleset which NATs my home LAN private range behind the public IP address on the WAN interface (My two interfaces igb0 and igb1 have been renamed to lan and wan respectively):

Code:
set skip on { lo0, lan }
nat on wan from ! (wan) to any -> (wan) round-robin
block drop log all
pass out quick on wan all flags S/SA keep state
This works absolutely fine for all "normal" traffic from my various LAN hosts.

To enable me to work from home regularly, my work IT department have provided me with a VOIP phone, an Avaya 9611G. This phone has built-in IPsec and many colleagues use them from home without difficulties. It's supposed to establish a tunnel to the company IPsec gateways, get an IP address on the corporate network, and telephony services then just work.

Not so for me though. The phone's IPsec tunnel starts up and I can see the packets passing through the FreeBSD gateway. I see the initial ISAKMP packets passing back and forth on UDP port 500, then it negotiates NAT-Traversal with the peer and everything else is then encapsulated inside UDP on port 4500.

The problem is, when the phone tries to talk to the telephony servers on the corporate network through the tunnel, it fails and starts throwing "No signalling" errors. Seconds later, the IPsec tunnel drops and I have to manually start it up again.

Initially pf was translating the source port of the phone's IPsec packets so they were no longer from port 500 or 4500. I added a couple of extra translation rules to pf to avoid this:

Code:
nat on wan inet proto udp from ! (wan) port 500 to any -> (wan) port 500
nat on wan inet proto udp from ! (wan) port 4500 to any -> (wan) port 4500

This didn't help.

The IT guys at work aren't being much help in figuring this out. They can see my home phone registering with the call server, but then their logs show some kind of communication being blocked. They told me to set my firewall to permit TCP port 1720 (Q.931 call setup) and UDP port 1719 (Gatekeeper Discovery RAS), but as this is encapsulated inside the IPsec tunnel when it passes through pf, I don't see how it can help. pflog doesn't show any such packets being blocked.

One of the IT guys swapped out my phone with one that he had been successfully using from home the previous day, but the same problem occurred. They have concluded that something is wrong with my home networking setup and have gone silent on my support ticket.

I'm wondering if there's something unusual in what pf is doing with the IPsec packets that might be causing the telephony communication to fail. From all appearances, pf appears to be NATing the IPsec UDP packets just fine and keeping state for the replies. pf maintains UDP state for 2 minutes, but the IPsec peers are sending eachother keepalives every 20 seconds so the state never expires. I can't examine the traffic inside the tunnel, so it's very difficult to diagnose.

I've also tried toggling traffic normalisation in pf in case fragment reassembly was screwing something up. No change.

I'm hoping someone else may have some experience with using IPsec through pf+NAT and may have some insights. To clarify, the FreeBSD host itself isn't doing IPsec.

Can anyone help?

J
 
UDP fragment reassembly is really an issue with IPsec traffic passing a firewall. I don't know very much about PF, however, in my ipfw(8) ruleset I have:
Code:
...
/sbin/ipfw -q add 5010 allow udp from any to me 500,4500 in recv $WAN keep-state
/sbin/ipfw -q add 5020 allow udp from any to me in recv $WAN frag
...
The extra rule 5020 is necessary because UDP fragments don't have port numbers associated, and for this reason those are not matched by rule 5010, and without rule 5020, udp fragments would be blocked otherwise by a later catch-all rule. So, without this extra rule, IPsec tunnels are killed within a few seconds after it have been established.

For sure your setup is different, however there must be a facility in PF which allows the transpassing of fragments. By reducing the MTU, it would be possible to avoid fragments, however this would need to be adjusted in the IP-phone.
 
Did you check that you don't have overlapping local networks at home and at company?

Yes, no overlaps. 192.168.128.0/24 at home. 10.0.0.0/8 at work.

UDP fragment reassembly is really an issue with IPsec traffic passing a firewall. I don't know very much about PF, however, in my ipfw(8) ruleset I have:
Code:
...
/sbin/ipfw -q add 5010 allow udp from any to me 500,4500 in recv $WAN keep-state
/sbin/ipfw -q add 5020 allow udp from any to me in recv $WAN frag
...
The extra rule 5020 is necessary because UDP fragments don't have port numbers associated, and for this reason those are not matched by rule 5010, and without rule 5020, udp fragments would be blocked otherwise by a later catch-all rule. So, without this extra rule, IPsec tunnels are killed within a few seconds after it have been established.

For sure your setup is different, however there must be a facility in PF which allows the transpassing of fragments. By reducing the MTU, it would be possible to avoid fragments, however this would need to be adjusted in the IP-phone.

Now this is interesting. I'm definitely seeing UDP fragments, but it looked like pf was passing them through, if I recall correctly. I'll double-check that when I get home later.
 
I had another look through a tcpdump(1) capture I took of the phone traffic and there are quite a few instances of fragmentation, like this:

Code:
20:03:13.257883 IP (tos 0x68, ttl 128, id 61956, offset 0, flags [+], proto UDP (17), length 1228)
    192.168.128.143.ipsec-nat-t > xx.xx.xx.xx.ipsec-nat-t: UDP-encap: ESP(spi=0x00000000,seq=0xf2), length 1200
20:03:13.257908 IP (tos 0x68, ttl 128, id 61956, offset 1208, flags [none], proto UDP (17), length 88)
    192.168.128.143 > xx.xx.xx.xx: ip-proto-17

The combined size of these fragments is still well under the MTU of the connection, so it's puzzling why the phone is doing this. I verified that the path to the company VPN gateway supports a 1500 byte MTU.
 
MTU is reduced significantly by the IPsec protocol overhead. Some time ago I did measurements for L2TP/IPsec and the max. MTU without fragmentation was 1280 without NAT-T and 1230 with NAT-T (UDP encapsulation over 4500) -- this includes the overhead of L2TP though, which you don´t have.

I am not sure at which stage fragmented packages are reassembled by your PF firewall. In case it would be after NAT, then the fragments don´t arrive, because your NAT redirection is based on port numbers, and as said already port numbers are not attributed to UDP fragments. Perhaps you want to try NAT redirection based on the peer IP address.
 
Back
Top