PF IPv6 TSO + reply-to causes kernel crashes or slow transfer speed

Gohma · Oct 13, 2023

Hi!

We're trying to use the reply-to statement in pf rules with net.inet.tcp.tso enabled. It's working great for IPv4, but when trying to do it for IPv6 we get page fault kernel crashes or slow speeds, depending on if nginx is using sendfile or not. We first noticed it on physical servers and then reproduced in VMs, completely fresh installs with the latest install ISO. Setting up the TCP session and sending small amounts of data works, but larger (we've used 1MB for our tests) doesn't. Let me start by showing our environment.

For this test we have two virtual servers on the same subnet.
1881:
Acting as server in our tests, running pf.
IP: 10.250.59.117/24, fd00:1ab::1881/64
1882:
Acting as client in our tests.
IP: 10.250.59.118/24, fd00:1ab::1882/64

Code:

[root@dev-freebsdtest-arn1-1881 /usr/local/www/nginx]# uname -a
FreeBSD dev-freebsdtest-arn1-1881 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
[root@dev-freebsdtest-arn1-1881 /usr/local/www/nginx]# cat /etc/pf.conf
pass in all
pass out all keep state
pass in quick on vtnet0 reply-to (vtnet0 fd00:1ab::1882) proto tcp to fd00:1ab::1881 port { 80 8888 }
pass in quick on vtnet0 reply-to (vtnet0 10.250.59.118) proto tcp to 10.250.59.117 port { 80 8888 }

In this scenario the reply-to doesn't really do anything, traffic would be routed over that interface and destination MAC anyway, but it's enough to demonstrate the problem. IPv6 works perfectly if we remove reply-to (vtnet0 fd00:1ab::1882) from the pf config.

For the test we're using nginx with a default config, the only thing we've changed is that we added listen [::]:80 default_server; and changed server_name to _. We're only including the IPv6 tests here, since IPv4 works as intended.

Code:

[root@dev-freebsdtest-arn1-1882 ~]# curl [fd00:1ab::1881]:80/1MB.A.txt -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  976k  100  976k    0     0   4836      0  0:03:26  0:03:26 --:--:--  4916

Note the speed above, 4916 bytes/second, this speed is very consistent. IPv4 reaches hundreds of MB/s. Looking at tcpdump on the server side it shows that the reply packets take over 250ms each.

Sending data using netcat on both sides show the same slow transfer speeds, so it has nothing to do with nginx or curl.

Disabling net.inet.tcp.tso on the server solves both the slow speeds and the crashes.

Enabling sendfile in nginx causes an instant kernel panic on the server instead, se attached core.txt file. This is probably a larger problem in general, since it can open up for DoS attacks.

We're grateful for any pointers or ideas how to troubleshoot this, or where to bug report it.

Kristof Provost · Oct 14, 2023

I suspect you're running into https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268400 or a related issue.

sysctl kern.ipc.mb_use_ext_pgs=0 will probably make it go away.

Gohma · Oct 14, 2023

Thank you for the link, it definitely looks similar, with the difference that we use sendfile instead of ktls, otherwise the stack traces match very well. sysctl kern.ipc.mb_use_ext_pgs=0 does indeed stop the kernel panics, but the transfer speed is still low.

Is the fix for that bug included in 14.0 RC1? We tried installing it on a new VM and did the same tests. It doesn't crash, but the transfer speed is around 270 kB/s. Still way low compared to IPv4, but at least like 50 times faster than 13.2

Setting kern.ipc.mb_use_ext_pgs=0, net.inet.tcp.tso=0 or disabling sendfile doesn't work on our more complex full system set ups though. Anything else we can try?

Kristof Provost · Oct 16, 2023

I have no idea what your setup is, so I cannot offer any useful feedback to "It's slow".

Gohma · Oct 16, 2023

You can ignore the comment about our full system set ups, I absolutely understand you can't help with that.

Though we were able to reproduce the slowness on completely clean installs. On 13.2 we get around 4900 B/s, on 14.0RC1 we get 270 kB/s. This is slow enough that it can't really be down to slow hardware, especially since IPv4 between the same servers is fast, at hundreds of MB/s.

PF IPv6 TSO + reply-to causes kernel crashes or slow transfer speed

Gohma

Attachments

Kristof Provost

Gohma

Kristof Provost

Gohma