Hello all,
I'm trying to set up a little router lab for playing around but I've hit a hurdle pretty early on. I have a M11SDV-8CT-LN4F board from Supermicro. It's an AMD Epyc 3201, so not the fastest CPU but it is quite modern and I've found it to be really well supported in FreeBSD 12.2-RELEASE so far.
But, I'm hitting a pretty serious issue that's so far leaving me dead in the water. My goal was simple: connect my 1gbe uplink to igb0 and use a 10gbe card as the trunk connection to my L3 switch. VLAN routing duties would all be performed on the switch itself, with the FreeBSD host just running PF for Firewall and NAT duties. Unfortunately I'm seeing abysmal performance. Clients only see about 200mb/sec down through the firewall. Oddly enough, upload manages to saturate the 1gb/sec uplink just fine.
To simplify my testing, I now have the following network topology:
So HostA and the firewall use the 10gbe pipe for all traffic, the firewall and HostB use the 1gbe uplink. I've completely removed L3 switch and my ISP from the equation. Further, I've completely disabled PF, so there is zero NATing happening just to rule that out.
Some iperf3measurements:
HostA to Firewall:
Firewall to HostA:
For both of these tests, CPU usage was the limiting factor but as you can see it's very near the 10gb/sec anyway. This is fine, I'm not interested in saturating 10gb/sec at the firewall anyway.
Here's where it gets interesting.
HostA to HostB:
Not great! CPU was 99.6% idle the whole time. But it gets worse.
HostB to HostA:
Honestly that's just sad. I will note however that if I up the number of connections in iperf3 I can saturate the 1gb/sec pipe but I should be able to get a single connection to 1gb/sec without breaking a sweat.
Now, if I change the configuration such that the connection between HostA and the Firewall is just a 1gbe pipe (on igb1), such that it looks like:
From HostA to HostB:
From HostB to HostA:
To me this proves that the firewall is capable of forwarding 1gb/sec fine, however, something with the 10gbe side seems to greatly degrade overall perf.
I have tried two different cards in the firewall, my Chelsio T440 as well as a 40GBE Intel XL710 (using a 4x10GBE breakout cable). I have tried tweaking various net.inet.tcp.* tunables as well as trying the cc_htcp congestion control algorithm, with basically zero difference from the above. I can post these if it would be helpful.
I'm hoping someone here has an idea of something else to try, as I've hit my wall. Given that this occurs across different hardware, I'm worried I'm somehow hitting the limit of the CPU/motherboard itself, but I just don't see how that's possible given that symmetric 1gb/sec is easily achieved over the igb interfaces. However I'd still appreciate it if someone with more experience could just tell me if what I want is impossible for some reason that I'm not understanding.
In the meantime I'll likely try swapping around the 10GBE cards and cables between HostA and the Firewall to see if it's just some failing hardware or something.
I'm trying to set up a little router lab for playing around but I've hit a hurdle pretty early on. I have a M11SDV-8CT-LN4F board from Supermicro. It's an AMD Epyc 3201, so not the fastest CPU but it is quite modern and I've found it to be really well supported in FreeBSD 12.2-RELEASE so far.
But, I'm hitting a pretty serious issue that's so far leaving me dead in the water. My goal was simple: connect my 1gbe uplink to igb0 and use a 10gbe card as the trunk connection to my L3 switch. VLAN routing duties would all be performed on the switch itself, with the FreeBSD host just running PF for Firewall and NAT duties. Unfortunately I'm seeing abysmal performance. Clients only see about 200mb/sec down through the firewall. Oddly enough, upload manages to saturate the 1gb/sec uplink just fine.
To simplify my testing, I now have the following network topology:
Code:
[ HostA (FreeBSD 12.2) ]
[ Chelsio T420 | cxgbe0 172.16.30.2/24 ]
| |
| 10GBE |
| |
[ Chelsio T440 | cxgbe0 172.16.30.1/24 ]
[ Firewall (FreeBSD 12.2) ]
[ Intel Pro1000 | igb0 172.16.10.1/24 ]
| |
| 1GBE |
| |
[ Intel Pro1000 | 172.16.10.3/24 ]
[ Host B (Windows 10) ]
Some iperf3measurements:
HostA to Firewall:
Code:
hosta> iperf3 -c 172.16.30.1
Connecting to host 172.16.30.1, port 5201
[ 6] local 172.16.30.2 port 57159 connected to 172.16.30.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 6] 0.00-1.00 sec 767 MBytes 6.43 Gbits/sec 119 719 KBytes
[ 6] 1.00-2.00 sec 843 MBytes 7.08 Gbits/sec 65 1.05 MBytes
[ 6] 2.00-3.00 sec 850 MBytes 7.13 Gbits/sec 67 1.32 MBytes
[ 6] 3.00-4.00 sec 849 MBytes 7.13 Gbits/sec 149 837 KBytes
[ 6] 4.00-5.00 sec 852 MBytes 7.15 Gbits/sec 67 1.15 MBytes
[ 6] 5.00-6.00 sec 849 MBytes 7.12 Gbits/sec 134 491 KBytes
[ 6] 6.00-7.00 sec 851 MBytes 7.14 Gbits/sec 63 955 KBytes
[ 6] 7.00-8.00 sec 856 MBytes 7.18 Gbits/sec 66 1.24 MBytes
[ 6] 8.00-9.00 sec 832 MBytes 6.98 Gbits/sec 139 652 KBytes
[ 6] 9.00-10.00 sec 852 MBytes 7.15 Gbits/sec 64 1.03 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 6] 0.00-10.00 sec 8.20 GBytes 7.05 Gbits/sec 933 sender
[ 6] 0.00-10.00 sec 8.20 GBytes 7.04 Gbits/sec receiver
Code:
hosta> iperf3 -c 172.16.30.1 -R
Connecting to host 172.16.30.1, port 5201
Reverse mode, remote host 172.16.30.1 is sending
[ 6] local 172.16.30.2 port 43906 connected to 172.16.30.1 port 5201
[ ID] Interval Transfer Bitrate
[ 6] 0.00-1.00 sec 986 MBytes 8.27 Gbits/sec
[ 6] 1.00-2.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 2.00-3.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 3.00-4.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 4.00-5.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 5.00-6.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 6.00-7.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 7.00-8.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 8.00-9.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 6] 9.00-10.00 sec 1.10 GBytes 9.41 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 6] 0.00-10.00 sec 10.8 GBytes 9.30 Gbits/sec 0 sender
[ 6] 0.00-10.00 sec 10.8 GBytes 9.30 Gbits/sec receiver
Here's where it gets interesting.
HostA to HostB:
Code:
hosta> iperf3 -c 172.16.10.3
Connecting to host 172.16.10.3, port 5201
[ 6] local 172.16.30.2 port 51615 connected to 172.16.10.3 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 6] 0.00-1.00 sec 78.1 MBytes 655 Mbits/sec 0 208 KBytes
[ 6] 1.00-2.00 sec 79.7 MBytes 668 Mbits/sec 0 208 KBytes
[ 6] 2.00-3.00 sec 85.9 MBytes 720 Mbits/sec 0 208 KBytes
[ 6] 3.00-4.00 sec 76.0 MBytes 638 Mbits/sec 0 208 KBytes
[ 6] 4.00-5.00 sec 82.9 MBytes 696 Mbits/sec 0 208 KBytes
[ 6] 5.00-6.00 sec 77.4 MBytes 649 Mbits/sec 0 208 KBytes
[ 6] 6.00-7.00 sec 83.0 MBytes 696 Mbits/sec 0 208 KBytes
[ 6] 7.00-8.00 sec 84.3 MBytes 707 Mbits/sec 0 208 KBytes
[ 6] 8.00-9.00 sec 78.1 MBytes 655 Mbits/sec 0 208 KBytes
[ 6] 9.00-10.00 sec 83.4 MBytes 699 Mbits/sec 0 208 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 6] 0.00-10.00 sec 809 MBytes 678 Mbits/sec 0 sender
[ 6] 0.00-10.00 sec 809 MBytes 678 Mbits/sec receiver
HostB to HostA:
Code:
hosta> iperf3 -c 172.16.10.3 -R
Connecting to host 172.16.10.3, port 5201
Reverse mode, remote host 172.16.10.3 is sending
[ 6] local 172.16.30.2 port 43155 connected to 172.16.10.3 port 5201
[ ID] Interval Transfer Bitrate
[ 6] 0.00-1.00 sec 39.8 MBytes 333 Mbits/sec
[ 6] 1.00-2.00 sec 31.9 MBytes 268 Mbits/sec
[ 6] 2.00-3.00 sec 34.5 MBytes 289 Mbits/sec
[ 6] 3.00-4.00 sec 33.6 MBytes 282 Mbits/sec
[ 6] 4.00-5.00 sec 32.8 MBytes 275 Mbits/sec
[ 6] 5.00-6.00 sec 36.4 MBytes 305 Mbits/sec
[ 6] 6.00-7.00 sec 33.4 MBytes 280 Mbits/sec
[ 6] 7.00-8.00 sec 35.3 MBytes 296 Mbits/sec
[ 6] 8.00-9.00 sec 34.6 MBytes 290 Mbits/sec
[ 6] 9.00-10.00 sec 34.5 MBytes 290 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 6] 0.00-10.00 sec 347 MBytes 291 Mbits/sec sender
[ 6] 0.00-10.00 sec 347 MBytes 291 Mbits/sec receiver
Now, if I change the configuration such that the connection between HostA and the Firewall is just a 1gbe pipe (on igb1), such that it looks like:
Code:
[ HostA (FreeBSD 12.2) ]
[ Intel Pro1000 | igb0 172.16.20.2/24 ]
| |
| 1GBE |
| |
[ Intel Pro1000 | igb1 172.16.20.1/24 ]
[ Firewall (FreeBSD 12.2) ]
[ Intel Pro1000 | igb0 172.16.10.1/24 ]
| |
| 1GBE |
| |
[ Intel Pro1000 | 172.16.10.3/24 ]
[ Host B (Windows 10) ]
Code:
hosta> iperf3 -c 172.16.10.3
Connecting to host 172.16.10.3, port 5201
[ 6] local 172.16.20.2 port 48036 connected to 172.16.10.3 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 6] 0.00-1.00 sec 113 MBytes 950 Mbits/sec 0 208 KBytes
[ 6] 1.00-2.00 sec 113 MBytes 949 Mbits/sec 0 208 KBytes
[ 6] 2.00-3.00 sec 113 MBytes 950 Mbits/sec 0 208 KBytes
[ 6] 3.00-4.00 sec 113 MBytes 949 Mbits/sec 0 208 KBytes
[ 6] 4.00-5.00 sec 113 MBytes 949 Mbits/sec 0 208 KBytes
[ 6] 5.00-6.00 sec 113 MBytes 949 Mbits/sec 0 208 KBytes
[ 6] 6.00-7.00 sec 113 MBytes 950 Mbits/sec 0 208 KBytes
[ 6] 7.00-8.00 sec 113 MBytes 949 Mbits/sec 0 208 KBytes
[ 6] 8.00-9.00 sec 113 MBytes 949 Mbits/sec 0 208 KBytes
[ 6] 9.00-10.00 sec 113 MBytes 949 Mbits/sec 0 208 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 6] 0.00-10.00 sec 1.11 GBytes 949 Mbits/sec 0 sender
[ 6] 0.00-10.00 sec 1.11 GBytes 949 Mbits/sec receiver
Code:
hosta> iperf3 -c 172.16.10.3 -R
Connecting to host 172.16.10.3, port 5201
Reverse mode, remote host 172.16.10.3 is sending
[ 6] local 172.16.20.2 port 54198 connected to 172.16.10.3 port 5201
[ ID] Interval Transfer Bitrate
[ 6] 0.00-1.00 sec 112 MBytes 940 Mbits/sec
[ 6] 1.00-2.00 sec 112 MBytes 942 Mbits/sec
[ 6] 2.00-3.00 sec 112 MBytes 942 Mbits/sec
[ 6] 3.00-4.00 sec 112 MBytes 942 Mbits/sec
[ 6] 4.00-5.00 sec 112 MBytes 942 Mbits/sec
[ 6] 5.00-6.00 sec 112 MBytes 942 Mbits/sec
[ 6] 6.00-7.00 sec 112 MBytes 942 Mbits/sec
[ 6] 7.00-8.00 sec 112 MBytes 942 Mbits/sec
[ 6] 8.00-9.00 sec 112 MBytes 942 Mbits/sec
[ 6] 9.00-10.00 sec 112 MBytes 942 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 6] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec sender
[ 6] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec receiver
I have tried two different cards in the firewall, my Chelsio T440 as well as a 40GBE Intel XL710 (using a 4x10GBE breakout cable). I have tried tweaking various net.inet.tcp.* tunables as well as trying the cc_htcp congestion control algorithm, with basically zero difference from the above. I can post these if it would be helpful.
I'm hoping someone here has an idea of something else to try, as I've hit my wall. Given that this occurs across different hardware, I'm worried I'm somehow hitting the limit of the CPU/motherboard itself, but I just don't see how that's possible given that symmetric 1gb/sec is easily achieved over the igb interfaces. However I'd still appreciate it if someone with more experience could just tell me if what I want is impossible for some reason that I'm not understanding.
In the meantime I'll likely try swapping around the 10GBE cards and cables between HostA and the Firewall to see if it's just some failing hardware or something.