Solved Latency issue caused by PF

Hi,

I currently am having an issue with PF causing severe latency issues for certain SSH connections. To help explain the issue I made a mediocre diagram of the setup causing issues:

jump_host.png


I have a server running 13.0-RELEASE-p4 with two physical NICs. NIC0 is directly connected to a management network and is used to SSH into the physical machine. NIC1 is a VLAN trunk for two VLANs. A 'public' VLAN and 'management' VLAN (same network the server is connected to via NIC0).
The server is running a VNET jail that I use as a jump host to SSH from a system sitting in the public VLAN to systems sitting in the management VLAN. The jail is connected to the relevant networks using two epairs (jump_pub, jump_mgmt) which are connected to two bridges whose other members are the two separate VLANs coming in via NIC1. No IPs are assigned to the VLAN interfaces, the bridges or the epair ends connected to the bridges.
Traffic on the system is filtered using PF from the physical host itself with all filtering being done on the interfaces and not the bridges. No filtering is done inside the jail.

This setup works exactly as expected and I can SSH via the jump host to systems in the management network with no issues. The problem that I'm running into is when I SSH into the server running the jail via the jump host i.e. the connection flow looks like this:
Code:
ssh in via public vlan on NIC1 -> jump_pub -> jump_mgmt -> out via management vlan on NIC1 and into server via NIC0

When I do that the SSH connection to the server is established near instantaneously but once it's up and running the connection is slow as molasses and barely usable .
Disabling PF resolves any latency issues but when I enable PF, even if it's just with a ruleset that has nothing but a "pass all" rule in it the connection runs extremely slow.
When SSHing into other systems on the management network via the jump host I see no latency issues. I also tried SSHing directly onto the server (via another system on the management network) and that also doesn't seem to cause any issues. Interestingly connection bandwidth doesn't seem to be affected by this as I ran iperf3 (jump host -> physical server) and the links were more or less saturated with no dropped packets.

I've run similar setups on other FreeBSD systems and don't seem to have come across this particular issue before. Does anybody have any idea what might be causing this and how I can fix it?
 
Disabling PF resolves any latency issues but when I enable PF, even if it's just with a ruleset that has nothing but a "pass all" rule in it the connection runs extremely slow.
Try turning off LRO and/or TSO on the interface.

Code:
     -tso    If the driver supports tcp(4) segmentation offloading, disable
             TSO on the interface.  It will always disable TSO for ip(4) and
             ip6(4).

     -lro    If the driver supports tcp(4) large receive offloading, disable
             LRO on the interface.
ifconfig(8)
 
Thanks, disabling LRO on NIC1 seems to have resolved the issue. I read up on why TSO and LRO aren't really desirable on NICs that serve a firewall/router but I still don't really understand why, in this case, it would only have affected one connection going through that NIC and not any others. Any ideas on what might be going on here or pointers where I could find some more answers? Or is this just one of those situations where it's down to weird NIC specific interactions that are really hard/not worth further investigating?
 
Back
Top