10gbe cards forwarding packets slower than 1gbe? Pulling my hair out here

Hello all,

I'm trying to set up a little router lab for playing around but I've hit a hurdle pretty early on. I have a M11SDV-8CT-LN4F board from Supermicro. It's an AMD Epyc 3201, so not the fastest CPU but it is quite modern and I've found it to be really well supported in FreeBSD 12.2-RELEASE so far.

But, I'm hitting a pretty serious issue that's so far leaving me dead in the water. My goal was simple: connect my 1gbe uplink to igb0 and use a 10gbe card as the trunk connection to my L3 switch. VLAN routing duties would all be performed on the switch itself, with the FreeBSD host just running PF for Firewall and NAT duties. Unfortunately I'm seeing abysmal performance. Clients only see about 200mb/sec down through the firewall. Oddly enough, upload manages to saturate the 1gb/sec uplink just fine.

To simplify my testing, I now have the following network topology:

Code:
[ HostA (FreeBSD 12.2)                       ]
[     Chelsio T420 | cxgbe0 172.16.30.2/24   ]
             |         |
             |  10GBE  |
             |         |
[     Chelsio T440 | cxgbe0 172.16.30.1/24   ]
[ Firewall (FreeBSD 12.2)                    ]
[     Intel Pro1000 | igb0 172.16.10.1/24    ]
              |      |
              | 1GBE |
              |      |
[     Intel Pro1000 | 172.16.10.3/24         ]
[ Host B (Windows 10)                        ]
So HostA and the firewall use the 10gbe pipe for all traffic, the firewall and HostB use the 1gbe uplink. I've completely removed L3 switch and my ISP from the equation. Further, I've completely disabled PF, so there is zero NATing happening just to rule that out.

Some iperf3measurements:

HostA to Firewall:
Code:
hosta> iperf3 -c 172.16.30.1
Connecting to host 172.16.30.1, port 5201
[  6] local 172.16.30.2 port 57159 connected to 172.16.30.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec   767 MBytes  6.43 Gbits/sec  119    719 KBytes       
[  6]   1.00-2.00   sec   843 MBytes  7.08 Gbits/sec   65   1.05 MBytes       
[  6]   2.00-3.00   sec   850 MBytes  7.13 Gbits/sec   67   1.32 MBytes       
[  6]   3.00-4.00   sec   849 MBytes  7.13 Gbits/sec  149    837 KBytes       
[  6]   4.00-5.00   sec   852 MBytes  7.15 Gbits/sec   67   1.15 MBytes       
[  6]   5.00-6.00   sec   849 MBytes  7.12 Gbits/sec  134    491 KBytes       
[  6]   6.00-7.00   sec   851 MBytes  7.14 Gbits/sec   63    955 KBytes       
[  6]   7.00-8.00   sec   856 MBytes  7.18 Gbits/sec   66   1.24 MBytes       
[  6]   8.00-9.00   sec   832 MBytes  6.98 Gbits/sec  139    652 KBytes       
[  6]   9.00-10.00  sec   852 MBytes  7.15 Gbits/sec   64   1.03 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec  8.20 GBytes  7.05 Gbits/sec  933             sender
[  6]   0.00-10.00  sec  8.20 GBytes  7.04 Gbits/sec                  receiver
Firewall to HostA:
Code:
hosta> iperf3 -c 172.16.30.1 -R
Connecting to host 172.16.30.1, port 5201
Reverse mode, remote host 172.16.30.1 is sending
[  6] local 172.16.30.2 port 43906 connected to 172.16.30.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-1.00   sec   986 MBytes  8.27 Gbits/sec                  
[  6]   1.00-2.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   2.00-3.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   3.00-4.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   4.00-5.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   5.00-6.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   6.00-7.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   7.00-8.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   8.00-9.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   9.00-10.00  sec  1.10 GBytes  9.41 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec  10.8 GBytes  9.30 Gbits/sec    0             sender
[  6]   0.00-10.00  sec  10.8 GBytes  9.30 Gbits/sec                  receiver
For both of these tests, CPU usage was the limiting factor but as you can see it's very near the 10gb/sec anyway. This is fine, I'm not interested in saturating 10gb/sec at the firewall anyway.

Here's where it gets interesting.

HostA to HostB:
Code:
hosta> iperf3 -c 172.16.10.3
Connecting to host 172.16.10.3, port 5201
[  6] local 172.16.30.2 port 51615 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec  78.1 MBytes   655 Mbits/sec    0    208 KBytes       
[  6]   1.00-2.00   sec  79.7 MBytes   668 Mbits/sec    0    208 KBytes       
[  6]   2.00-3.00   sec  85.9 MBytes   720 Mbits/sec    0    208 KBytes       
[  6]   3.00-4.00   sec  76.0 MBytes   638 Mbits/sec    0    208 KBytes       
[  6]   4.00-5.00   sec  82.9 MBytes   696 Mbits/sec    0    208 KBytes       
[  6]   5.00-6.00   sec  77.4 MBytes   649 Mbits/sec    0    208 KBytes       
[  6]   6.00-7.00   sec  83.0 MBytes   696 Mbits/sec    0    208 KBytes       
[  6]   7.00-8.00   sec  84.3 MBytes   707 Mbits/sec    0    208 KBytes       
[  6]   8.00-9.00   sec  78.1 MBytes   655 Mbits/sec    0    208 KBytes       
[  6]   9.00-10.00  sec  83.4 MBytes   699 Mbits/sec    0    208 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec   809 MBytes   678 Mbits/sec    0             sender
[  6]   0.00-10.00  sec   809 MBytes   678 Mbits/sec                  receiver
Not great! CPU was 99.6% idle the whole time. But it gets worse.

HostB to HostA:
Code:
hosta> iperf3 -c 172.16.10.3 -R
Connecting to host 172.16.10.3, port 5201
Reverse mode, remote host 172.16.10.3 is sending
[  6] local 172.16.30.2 port 43155 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-1.00   sec  39.8 MBytes   333 Mbits/sec                  
[  6]   1.00-2.00   sec  31.9 MBytes   268 Mbits/sec                  
[  6]   2.00-3.00   sec  34.5 MBytes   289 Mbits/sec                  
[  6]   3.00-4.00   sec  33.6 MBytes   282 Mbits/sec                  
[  6]   4.00-5.00   sec  32.8 MBytes   275 Mbits/sec                  
[  6]   5.00-6.00   sec  36.4 MBytes   305 Mbits/sec                  
[  6]   6.00-7.00   sec  33.4 MBytes   280 Mbits/sec                  
[  6]   7.00-8.00   sec  35.3 MBytes   296 Mbits/sec                  
[  6]   8.00-9.00   sec  34.6 MBytes   290 Mbits/sec                  
[  6]   9.00-10.00  sec  34.5 MBytes   290 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-10.00  sec   347 MBytes   291 Mbits/sec                  sender
[  6]   0.00-10.00  sec   347 MBytes   291 Mbits/sec                  receiver
Honestly that's just sad. I will note however that if I up the number of connections in iperf3 I can saturate the 1gb/sec pipe but I should be able to get a single connection to 1gb/sec without breaking a sweat.

Now, if I change the configuration such that the connection between HostA and the Firewall is just a 1gbe pipe (on igb1), such that it looks like:
Code:
[ HostA (FreeBSD 12.2)                       ]
[     Intel Pro1000 | igb0 172.16.20.2/24    ]
              |      |
              | 1GBE |
              |      |
[     Intel Pro1000 | igb1 172.16.20.1/24    ]
[ Firewall (FreeBSD 12.2)                    ]
[     Intel Pro1000 | igb0 172.16.10.1/24    ]
              |      |
              | 1GBE |
              |      |
[     Intel Pro1000 | 172.16.10.3/24         ]
[ Host B (Windows 10)                        ]
From HostA to HostB:
Code:
hosta> iperf3 -c 172.16.10.3
Connecting to host 172.16.10.3, port 5201
[  6] local 172.16.20.2 port 48036 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec   113 MBytes   950 Mbits/sec    0    208 KBytes       
[  6]   1.00-2.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   2.00-3.00   sec   113 MBytes   950 Mbits/sec    0    208 KBytes       
[  6]   3.00-4.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   4.00-5.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   5.00-6.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   6.00-7.00   sec   113 MBytes   950 Mbits/sec    0    208 KBytes       
[  6]   7.00-8.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   8.00-9.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   9.00-10.00  sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec  1.11 GBytes   949 Mbits/sec    0             sender
[  6]   0.00-10.00  sec  1.11 GBytes   949 Mbits/sec                  receiver
From HostB to HostA:
Code:
hosta> iperf3 -c 172.16.10.3 -R
Connecting to host 172.16.10.3, port 5201
Reverse mode, remote host 172.16.10.3 is sending
[  6] local 172.16.20.2 port 54198 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-1.00   sec   112 MBytes   940 Mbits/sec                  
[  6]   1.00-2.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   2.00-3.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   3.00-4.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   4.00-5.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   5.00-6.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   6.00-7.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   7.00-8.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   8.00-9.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   9.00-10.00  sec   112 MBytes   942 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec                  sender
[  6]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec                  receiver
To me this proves that the firewall is capable of forwarding 1gb/sec fine, however, something with the 10gbe side seems to greatly degrade overall perf.
I have tried two different cards in the firewall, my Chelsio T440 as well as a 40GBE Intel XL710 (using a 4x10GBE breakout cable). I have tried tweaking various net.inet.tcp.* tunables as well as trying the cc_htcp congestion control algorithm, with basically zero difference from the above. I can post these if it would be helpful.

I'm hoping someone here has an idea of something else to try, as I've hit my wall. Given that this occurs across different hardware, I'm worried I'm somehow hitting the limit of the CPU/motherboard itself, but I just don't see how that's possible given that symmetric 1gb/sec is easily achieved over the igb interfaces. However I'd still appreciate it if someone with more experience could just tell me if what I want is impossible for some reason that I'm not understanding.

In the meantime I'll likely try swapping around the 10GBE cards and cables between HostA and the Firewall to see if it's just some failing hardware or something.
 
Also, what's your net.inet.tcp.recvbuf* and sendbuf* values? Is netstat -an showing a growing buffer as the test progresses on the client?
What's your net.inet.tcp.initcwnd_segments value?
How's your memory when it's running?
 
hey guys thanks for dropping by, some more info:
Code:
firewall> ifconfig
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 3c:ec:ef:47:b9:c8
    inet 172.16.10.1 netmask 0xffffff00 broadcast 172.16.10.255
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 3c:ec:ef:47:b9:c9
    inet 172.16.20.1 netmask 0xffffff00 broadcast 172.16.20.255
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb2: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 3c:ec:ef:47:b9:ca
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 3c:ec:ef:47:b9:cb
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
cxgbe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=2ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP>
    ether 00:07:43:11:df:40
    inet 172.16.30.1 netmask 0xffffff00 broadcast 172.16.30.255
    media: Ethernet 10Gbase-Twinax <full-duplex,rxpause,txpause>
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
cxgbe1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=2ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP>
    ether 00:07:43:11:df:48
    media: Ethernet none
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
cxgbe2: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=2ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP>
    ether 00:07:43:11:df:50
    media: Ethernet none
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
cxgbe3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=2ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP>
    ether 00:07:43:11:df:58
    media: Ethernet none
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x9
    inet 127.0.0.1 netmask 0xff000000
    groups: lo
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

Code:
hosta> ifconfig
cxgbe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=2ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP>
    ether 00:07:43:11:5e:c0
    inet 172.16.30.2 netmask 0xffffff00 broadcast 172.16.30.255
    media: Ethernet 10Gbase-Twinax <full-duplex,rxpause,txpause>
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
cxgbe1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=2ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWRXTSTMP>
    ether 00:07:43:11:5e:c8
    media: Ethernet none
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
em0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER>
    ether 00:25:90:d8:6d:be
    media: Ethernet autoselect
    status: no carrier
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 00:25:90:d8:6d:bf
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    inet6 ::1 prefixlen 128
    inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
    inet 127.0.0.1 netmask 0xff000000
    groups: lo
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

For HostB (Windows) it's just a single static IP of 172.16.10.3/24 with a default gateway of 172.16.10.1

As for the sysctl values:
Firewall:
Code:
net.inet.tcp.sendbuf_auto_lowat: 0
net.inet.tcp.sendbuf_max: 2097152
net.inet.tcp.sendbuf_inc: 8192
net.inet.tcp.sendbuf_auto: 1
net.inet.tcp.recvbuf_max: 2097152
net.inet.tcp.recvbuf_inc: 16384
net.inet.tcp.recvbuf_auto: 1
net.inet.tcp.initcwnd_segments: 10
HostA:
Code:
net.inet.tcp.sendbuf_auto_lowat: 0
net.inet.tcp.sendbuf_max: 2097152
net.inet.tcp.sendbuf_inc: 8192
net.inet.tcp.sendbuf_auto: 1
net.inet.tcp.recvbuf_max: 2097152
net.inet.tcp.recvbuf_inc: 16384
net.inet.tcp.recvbuf_auto: 1
net.inet.tcp.initcwnd_segments: 10

On HostA I ran iperf3 -c 172.16.10.3 and monitored the output of netperf -an a few times:
Code:
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0      0 172.16.30.2.10723      172.16.10.3.5201       ESTABLISHED
tcp4       0      0 172.16.30.2.10890      172.16.10.3.5201       ESTABLISHED
tcp4       0    160 172.16.30.2.22         172.16.30.1.41395      ESTABLISHED
tcp4       0      0 *.22                   *.*                    LISTEN     
tcp6       0      0 *.22                   *.*                    LISTEN     
udp4       0      0 *.514                  *.*                    
udp6       0      0 *.514                  *.*                    
Active UNIX domain sockets
Address          Type   Recv-Q Send-Q            Inode             Conn             Refs          Nextref Addr
fffff8001e5b1700 stream      0      0 fffff800330cb000                0                0                0 /tmp/tmux-1001/default
fffff800111c4a00 stream      0      0                0 fffff800111c4b00                0                0
fffff800111c4b00 stream      0      0                0 fffff800111c4a00                0                0
fffff8001e7fdc00 stream      0      0                0 fffff8001e7fde00                0                0
fffff8001e7fde00 stream      0      0                0 fffff8001e7fdc00                0                0
fffff80011d8a200 stream      0      0                0                0                0                0
fffff8001e0ca100 stream      0      0 fffff8001e442000                0                0                0 /var/run/devd.pipe
fffff8001e0ced00 dgram       0      0                0 fffff8001e7f9100                0 fffff8001e496600
fffff8001e496600 dgram       0      0                0 fffff8001e7f9100                0                0
fffff8001e7f9100 dgram       0      0 fffff8001e44e5a0                0 fffff8001e0ced00                0 /var/run/logpriv
fffff8001e7f9200 dgram       0      0 fffff8001e44e780                0                0                0 /var/run/log
fffff8001e0ca000 seqpac      0      0 fffff8001e44bd20                0                0                0 /var/run/devd.seqpacket.pipe
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0      0 172.16.30.2.10723      172.16.10.3.5201       ESTABLISHED
tcp4       0      0 172.16.30.2.10890      172.16.10.3.5201       ESTABLISHED
tcp4       0    176 172.16.30.2.22         172.16.30.1.41395      ESTABLISHED
tcp4       0      0 *.22                   *.*                    LISTEN     
tcp6       0      0 *.22                   *.*                    LISTEN     
udp4       0      0 *.514                  *.*                    
udp6       0      0 *.514                  *.*                    
Active UNIX domain sockets
Address          Type   Recv-Q Send-Q            Inode             Conn             Refs          Nextref Addr
fffff8001e7fd600 stream      0      0                0 fffff8001e5b1200                0                0
fffff8001e5b1200 stream      0      0                0 fffff8001e7fd600                0                0
fffff800111c4800 stream      0      0                0 fffff8001e5b1600                0                0
fffff8001e5b1600 stream      0      0                0 fffff800111c4800                0                0
fffff8001e5b1700 stream      0      0 fffff800330cb000                0                0                0 /tmp/tmux-1001/default
fffff800111c4a00 stream      0      0                0 fffff800111c4b00                0                0
fffff800111c4b00 stream      0      0                0 fffff800111c4a00                0                0
fffff8001e7fdc00 stream      0      0                0 fffff8001e7fde00                0                0
fffff8001e7fde00 stream      0      0                0 fffff8001e7fdc00                0                0
fffff80011d8a200 stream      0      0                0                0                0                0
fffff8001e0ca100 stream      0      0 fffff8001e442000                0                0                0 /var/run/devd.pipe
fffff8001e0ced00 dgram       0      0                0 fffff8001e7f9100                0 fffff8001e496600
fffff8001e496600 dgram       0      0                0 fffff8001e7f9100                0                0
fffff8001e7f9100 dgram       0      0 fffff8001e44e5a0                0 fffff8001e0ced00                0 /var/run/logpriv
fffff8001e7f9200 dgram       0      0 fffff8001e44e780                0                0                0 /var/run/log
fffff8001e0ca000 seqpac      0      0 fffff8001e44bd20                0                0                0 /var/run/devd.seqpacket.pipe
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0      0 172.16.30.2.10723      172.16.10.3.5201       ESTABLISHED
tcp4       0      0 172.16.30.2.10890      172.16.10.3.5201       ESTABLISHED
tcp4       0    176 172.16.30.2.22         172.16.30.1.41395      ESTABLISHED
tcp4       0      0 *.22                   *.*                    LISTEN     
tcp6       0      0 *.22                   *.*                    LISTEN     
udp4       0      0 *.514                  *.*                    
udp6       0      0 *.514                  *.*                    
Active UNIX domain sockets
Address          Type   Recv-Q Send-Q            Inode             Conn             Refs          Nextref Addr
fffff8001e5b1700 stream      0      0 fffff800330cb000                0                0                0 /tmp/tmux-1001/default
fffff800111c4a00 stream      0      0                0 fffff800111c4b00                0                0
fffff800111c4b00 stream      0      0                0 fffff800111c4a00                0                0
fffff8001e7fdc00 stream      0      0                0 fffff8001e7fde00                0                0
fffff8001e7fde00 stream      0      0                0 fffff8001e7fdc00                0                0
fffff80011d8a200 stream      0      0                0                0                0                0
fffff8001e0ca100 stream      0      0 fffff8001e442000                0                0                0 /var/run/devd.pipe
fffff8001e0ced00 dgram       0      0                0 fffff8001e7f9100                0 fffff8001e496600
fffff8001e496600 dgram       0      0                0 fffff8001e7f9100                0                0
fffff8001e7f9100 dgram       0      0 fffff8001e44e5a0                0 fffff8001e0ced00                0 /var/run/logpriv
fffff8001e7f9200 dgram       0      0 fffff8001e44e780                0                0                0 /var/run/log
fffff8001e0ca000 seqpac      0      0 fffff8001e44bd20                0                0                0 /var/run/devd.seqpacket.pipe
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0      0 172.16.30.2.10723      172.16.10.3.5201       ESTABLISHED
tcp4       0      0 172.16.30.2.10890      172.16.10.3.5201       ESTABLISHED
tcp4       0    176 172.16.30.2.22         172.16.30.1.41395      ESTABLISHED
tcp4       0      0 *.22                   *.*                    LISTEN     
tcp6       0      0 *.22                   *.*                    LISTEN     
udp4       0      0 *.514                  *.*                    
udp6       0      0 *.514                  *.*                    
Active UNIX domain sockets
Address          Type   Recv-Q Send-Q            Inode             Conn             Refs          Nextref Addr
fffff8001e5b1700 stream      0      0 fffff800330cb000                0                0                0 /tmp/tmux-1001/default
fffff800111c4a00 stream      0      0                0 fffff800111c4b00                0                0
fffff800111c4b00 stream      0      0                0 fffff800111c4a00                0                0
fffff8001e7fdc00 stream      0      0                0 fffff8001e7fde00                0                0
fffff8001e7fde00 stream      0      0                0 fffff8001e7fdc00                0                0
fffff80011d8a200 stream      0      0                0                0                0                0
fffff8001e0ca100 stream      0      0 fffff8001e442000                0                0                0 /var/run/devd.pipe
fffff8001e0ced00 dgram       0      0                0 fffff8001e7f9100                0 fffff8001e496600
fffff8001e496600 dgram       0      0                0 fffff8001e7f9100                0                0
fffff8001e7f9100 dgram       0      0 fffff8001e44e5a0                0 fffff8001e0ced00                0 /var/run/logpriv
fffff8001e7f9200 dgram       0      0 fffff8001e44e780                0                0                0 /var/run/log
fffff8001e0ca000 seqpac      0      0 fffff8001e44bd20                0                0                0 /var/run/devd.seqpacket.pipe
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0      0 172.16.30.2.10723      172.16.10.3.5201       ESTABLISHED
tcp4       0      0 172.16.30.2.10890      172.16.10.3.5201       ESTABLISHED
tcp4       0    176 172.16.30.2.22         172.16.30.1.41395      ESTABLISHED
tcp4       0      0 *.22                   *.*                    LISTEN     
tcp6       0      0 *.22                   *.*                    LISTEN     
udp4       0      0 *.514                  *.*                    
udp6       0      0 *.514                  *.*                    
Active UNIX domain sockets
Address          Type   Recv-Q Send-Q            Inode             Conn             Refs          Nextref Addr
fffff8001e5b1700 stream      0      0 fffff800330cb000                0                0                0 /tmp/tmux-1001/default
fffff800111c4a00 stream      0      0                0 fffff800111c4b00                0                0
fffff800111c4b00 stream      0      0                0 fffff800111c4a00                0                0
fffff8001e7fdc00 stream      0      0                0 fffff8001e7fde00                0                0
fffff8001e7fde00 stream      0      0                0 fffff8001e7fdc00                0                0
fffff80011d8a200 stream      0      0                0                0                0                0
fffff8001e0ca100 stream      0      0 fffff8001e442000                0                0                0 /var/run/devd.pipe
fffff8001e0ced00 dgram       0      0                0 fffff8001e7f9100                0 fffff8001e496600
fffff8001e496600 dgram       0      0                0 fffff8001e7f9100                0                0
fffff8001e7f9100 dgram       0      0 fffff8001e44e5a0                0 fffff8001e0ced00                0 /var/run/logpriv
fffff8001e7f9200 dgram       0      0 fffff8001e44e780                0                0                0 /var/run/log
fffff8001e0ca000 seqpac      0      0 fffff8001e44bd20                0                0                0 /var/run/devd.seqpacket.pipe
Active Internet connections (including servers)
Proto Recv-Q Send-Q Local Address          Foreign Address        (state)    
tcp4       0      0 172.16.30.2.10890      172.16.10.3.5201       TIME_WAIT  
tcp4       0    176 172.16.30.2.22         172.16.30.1.41395      ESTABLISHED
tcp4       0      0 *.22                   *.*                    LISTEN     
tcp6       0      0 *.22                   *.*                    LISTEN     
udp4       0      0 *.514                  *.*                    
udp6       0      0 *.514                  *.*
Doesn't look like much is changing on that front.

As far as memory goes:
Firewall has 64GB total, 62GB free without any visible changes
HostA has 32GB total, 30GB free without any visible changes
HostB has 64GB total, 53GB free without any visible changes

Some additional info in case it helps, I tried to keep rc.conf and loader.conf as bare bones as possible:
Code:
firewall> cat /etc/rc.conf
hostname="firewall.local"

dumpdev="AUTO"

gateway_enable="YES"

sshd_enable="YES"
zfs_enable="YES"
smartd_enable="YES"

ifconfig_igb0="inet 172.16.10.1/24"
ifconfig_igb1="inet 172.16.20.1/24"
ifconfig_cxgbe0="inet 172.16.30.1/24"
Code:
firewall> cat /boot/loader.conf
autoboot_delay=2

kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"

hw.nvme.use_nvd="0"
hw.vmm.amdvi.enable="1"

aesni_load="YES"
amdtemp_load="YES"
if_cxgbe_load="YES"
ipmi_load="YES"
nmdm_load="YES"
opensolaris_load="YES"
t4fw_cfg_load="YES"
vmm_load="YES"
zfs_load="YES"
Code:
hosta> cat /etc/rc.conf
hostname="hosta.local"

dumpdev="NO"

zfs_enable="YES"
sshd_enable="YES"

ifconfig_igb0="inet 172.16.20.2/24"
#ifconfig_cxgbe0="inet 172.16.30.2/24"

static_routes="default"
route_default="default 172.16.20.1"
#route_default="default 172.16.30.1"
Code:
hosta> cat /boot/loader.conf
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
vfs.zfs.min_auto_ashift=12
net.link.tap.up_on_open="1"

zfs_load="YES"
aesni_load="YES"
if_cxgbe_load="YES"
t4fw_cfg_load="YES"
coretemp_load="YES"
vmm_load="YES"
nmdm_load="YES"

Forgot to add, I'd also appreciate it if anyone could also just chime in with a "hey this should 100% be working, something is definitely wrong". I don't mind digging into this but it sure makes it easier if I know that I might find success!

thanks!
 
The sendbuf_max and recvbuf_max should be 4x that for 10gb. Likewise sendbuf_inc/recvbuf_inc should be about 8x. If that helps, you can dial it up or down.

Edit: make initcwnd_segments to be 44 to match the tcp window scaling.
 
Chelsio needs to be tuned for best performance.
Just turning off LSO can get you 3x to 10x increase.
 
I'm hoping someone here has an idea of something else to try, as I've hit my wall. Given that this occurs across different hardware, I'm worried I'm somehow hitting the limit of the CPU/motherboard itself, but I just don't see how that's possible given that symmetric 1gb/sec is easily achieved over the igb interfaces. However I'd still appreciate it if someone with more experience could just tell me if what I want is impossible for some reason that I'm not understanding.

In the meantime I'll likely try swapping around the 10GBE cards and cables between HostA and the Firewall to see if it's just some failing hardware or something.
In case your investigations and the suggestions of the other guys here (who I believe to know much more than me) yield in no result, you might also consider the possibility that the igb cards or their driver could have a problem.

Years ago I had a computer with a 4-port igb card which had crawling slow transfer rates, even worse than the cheapest Realtek card.
I tried all things, was unable to find a solution.
On bugs.freebsd.org I found a number of PRs regarding poor igb performance on some particular installations for which apparently nobody knew a cause.
My router currently has igb card, and the performance seems fine. So I guess the problem is maybe bad power supply, demons, karma, random or whatever.

So I ended up replacing the igb card in the former computer with an older Intel card using another driver, and the problem was gone.
 
Hello all,

To simplify my testing, I now have the following network topology:

Code:
[ HostA (FreeBSD 12.2)                       ]
[     Chelsio T420 | cxgbe0 172.16.30.2/24   ]
             |         |
             |  10GBE  |
             |         |
[     Chelsio T440 | cxgbe0 172.16.30.1/24   ]
[ Firewall (FreeBSD 12.2)                    ]
[     Intel Pro1000 | igb0 172.16.10.1/24    ]
              |      |
              | 1GBE |
              |      |
[     Intel Pro1000 | 172.16.10.3/24         ]
[ Host B (Windows 10)                        ]


Some iperf3measurements:

HostA to Firewall:
[ 6] 0.00-10.00 sec 8.20 GBytes 7.05 Gbits/sec 933 sender

Firewall to HostA:
[ 6] 0.00-10.00 sec 10.8 GBytes 9.30 Gbits/sec 0 sender

For both of these tests, CPU usage was the limiting factor but as you can see it's very near the 10gb/sec anyway. This is fine, I'm not interested in saturating 10gb/sec at the firewall anyway.
This looks good, but I do not see Firewall to HostB. Looks the problem is in this branch.
 
Hey gang, sorry for the delay, had to step away from this to recollect my sanity. Also to enjoy some of the snow we got :)

So, some progress! I spent a few late nights doing tuning but I was seeing really odd results. Sometimes I'd make a change that wouldn't work, so I'd revert it (and reboot everything to be sure) and after a reboot perf would be even worse! Testing results seemed to not be following any clear pattern, so I turned to investigating hardware.

I tried 4 different DAC cables of various brands, no difference other than the randomess was still there. I tried moving all of the 10gbe cards around between hosts and pci slots, no real difference either. Frustrated, I returned everything back to how it was at the start of this thread, just to re-validate my starting point. This was the real shocker, speeds were even worse. I was seeing 400mb/sec between HostA and the Firewall!

During all of this, something else had been bugging me. The firewall felt slow even to use. For example, mkdir ~/benchmarks would sometimes just sit for 3 seconds before returning. Sometimes I'd have to wait 4+ seconds for the password prompt to show up after entering my username. I started watching system metrics while using the machine and noticed my NVME drive was behaving very strangely. gstat was showing that the drive was doing very little in terms of reads and writes but it was doing a ton of work. Touching the filesystem would peg the device at 100% busy but very little throughput was happening (like 10kbps sometimes). I installed bonnie++ to test the drive with an actual heavy workload and the system was completely unresponsive while the test ran. I could barely switch virtual consoles while it was running.

So to satisfy my curiosity I removed the NVME and instead installed a fresh install of the same 12.2 build onto a fairly old SSD. This made a huge difference in terms of how snappy the machine felt. With the SSD I'm not seeing any delays. Logging in, touching files, they all happen instantly. This was a huge surprise to me.

With this, I re-ran my benchmarks, and things have definitely improved there. The speeds aren't at full line rate yet, but they're much more stable. Before I was seeing a lot of dips in speeds, even ping tests would show random spikes in latency (200ms and up from sub 1ms) which is now all gone. Speeds are fairly consistent and ping times stay less than 1ms the whole time.

To me this sounds like the NVME is DOA (it's a brand new Seagate FireCuda 520), but I've also never tried to use NVME drives w/ ZFS and FreeBSD before. Thoughts on this? I'm thinking of just RMAing it and see if a replacement has the same problem.

I'm still hitting some limit between HostA and HostB, B->A can sustain 943mb/sec all day, but A->B hovers around 650mb/sec. I'll keep tuning things, now that the system is more stable I'm hoping progress will be a bit easier here.
 
Back
Top