Hello all,
I'm trying to set up a little router lab for playing around but I've hit a hurdle pretty early on. I have a M11SDV-8CT-LN4F board from Supermicro. It's an AMD Epyc 3201, so not the fastest CPU but it is quite modern and I've found it to be really well supported in FreeBSD 12.2-RELEASE so far.
But, I'm hitting a pretty serious issue that's so far leaving me dead in the water. My goal was simple: connect my 1gbe uplink to igb0 and use a 10gbe card as the trunk connection to my L3 switch. VLAN routing duties would all be performed on the switch itself, with the FreeBSD host just running PF for Firewall and NAT duties. Unfortunately I'm seeing abysmal performance. Clients only see about 200mb/sec down through the firewall. Oddly enough, upload manages to saturate the 1gb/sec uplink just fine.
To simplify my testing, I now have the following network topology:
	
	
	
		
So HostA and the firewall use the 10gbe pipe for all traffic, the firewall and HostB use the 1gbe uplink. I've completely removed L3 switch and my ISP from the equation. Further, I've completely disabled PF, so there is zero NATing happening just to rule that out.
Some iperf3measurements:
HostA to Firewall:
	
	
	
		
Firewall to HostA:
	
	
	
		
For both of these tests, CPU usage was the limiting factor but as you can see it's very near the 10gb/sec anyway. This is fine, I'm not interested in saturating 10gb/sec at the firewall anyway.
Here's where it gets interesting.
HostA to HostB:
	
	
	
		
Not great! CPU was 99.6% idle the whole time. But it gets worse.
HostB to HostA:
	
	
	
		
Honestly that's just sad. I will note however that if I up the number of connections in iperf3 I can saturate the 1gb/sec pipe but I should be able to get a single connection to 1gb/sec without breaking a sweat.
Now, if I change the configuration such that the connection between HostA and the Firewall is just a 1gbe pipe (on igb1), such that it looks like:
	
	
	
		
From HostA to HostB:
	
	
	
		
From HostB to HostA:
	
	
	
		
To me this proves that the firewall is capable of forwarding 1gb/sec fine, however, something with the 10gbe side seems to greatly degrade overall perf. 
I have tried two different cards in the firewall, my Chelsio T440 as well as a 40GBE Intel XL710 (using a 4x10GBE breakout cable). I have tried tweaking various net.inet.tcp.* tunables as well as trying the cc_htcp congestion control algorithm, with basically zero difference from the above. I can post these if it would be helpful.
I'm hoping someone here has an idea of something else to try, as I've hit my wall. Given that this occurs across different hardware, I'm worried I'm somehow hitting the limit of the CPU/motherboard itself, but I just don't see how that's possible given that symmetric 1gb/sec is easily achieved over the igb interfaces. However I'd still appreciate it if someone with more experience could just tell me if what I want is impossible for some reason that I'm not understanding.
In the meantime I'll likely try swapping around the 10GBE cards and cables between HostA and the Firewall to see if it's just some failing hardware or something.
				
			I'm trying to set up a little router lab for playing around but I've hit a hurdle pretty early on. I have a M11SDV-8CT-LN4F board from Supermicro. It's an AMD Epyc 3201, so not the fastest CPU but it is quite modern and I've found it to be really well supported in FreeBSD 12.2-RELEASE so far.
But, I'm hitting a pretty serious issue that's so far leaving me dead in the water. My goal was simple: connect my 1gbe uplink to igb0 and use a 10gbe card as the trunk connection to my L3 switch. VLAN routing duties would all be performed on the switch itself, with the FreeBSD host just running PF for Firewall and NAT duties. Unfortunately I'm seeing abysmal performance. Clients only see about 200mb/sec down through the firewall. Oddly enough, upload manages to saturate the 1gb/sec uplink just fine.
To simplify my testing, I now have the following network topology:
		Code:
	
	[ HostA (FreeBSD 12.2)                       ]
[     Chelsio T420 | cxgbe0 172.16.30.2/24   ]
             |         |
             |  10GBE  |
             |         |
[     Chelsio T440 | cxgbe0 172.16.30.1/24   ]
[ Firewall (FreeBSD 12.2)                    ]
[     Intel Pro1000 | igb0 172.16.10.1/24    ]
              |      |
              | 1GBE |
              |      |
[     Intel Pro1000 | 172.16.10.3/24         ]
[ Host B (Windows 10)                        ]Some iperf3measurements:
HostA to Firewall:
		Code:
	
	hosta> iperf3 -c 172.16.30.1
Connecting to host 172.16.30.1, port 5201
[  6] local 172.16.30.2 port 57159 connected to 172.16.30.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec   767 MBytes  6.43 Gbits/sec  119    719 KBytes       
[  6]   1.00-2.00   sec   843 MBytes  7.08 Gbits/sec   65   1.05 MBytes       
[  6]   2.00-3.00   sec   850 MBytes  7.13 Gbits/sec   67   1.32 MBytes       
[  6]   3.00-4.00   sec   849 MBytes  7.13 Gbits/sec  149    837 KBytes       
[  6]   4.00-5.00   sec   852 MBytes  7.15 Gbits/sec   67   1.15 MBytes       
[  6]   5.00-6.00   sec   849 MBytes  7.12 Gbits/sec  134    491 KBytes       
[  6]   6.00-7.00   sec   851 MBytes  7.14 Gbits/sec   63    955 KBytes       
[  6]   7.00-8.00   sec   856 MBytes  7.18 Gbits/sec   66   1.24 MBytes       
[  6]   8.00-9.00   sec   832 MBytes  6.98 Gbits/sec  139    652 KBytes       
[  6]   9.00-10.00  sec   852 MBytes  7.15 Gbits/sec   64   1.03 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec  8.20 GBytes  7.05 Gbits/sec  933             sender
[  6]   0.00-10.00  sec  8.20 GBytes  7.04 Gbits/sec                  receiver
		Code:
	
	hosta> iperf3 -c 172.16.30.1 -R
Connecting to host 172.16.30.1, port 5201
Reverse mode, remote host 172.16.30.1 is sending
[  6] local 172.16.30.2 port 43906 connected to 172.16.30.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-1.00   sec   986 MBytes  8.27 Gbits/sec                  
[  6]   1.00-2.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   2.00-3.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   3.00-4.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   4.00-5.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   5.00-6.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   6.00-7.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   7.00-8.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   8.00-9.00   sec  1.10 GBytes  9.41 Gbits/sec                  
[  6]   9.00-10.00  sec  1.10 GBytes  9.41 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec  10.8 GBytes  9.30 Gbits/sec    0             sender
[  6]   0.00-10.00  sec  10.8 GBytes  9.30 Gbits/sec                  receiverHere's where it gets interesting.
HostA to HostB:
		Code:
	
	hosta> iperf3 -c 172.16.10.3
Connecting to host 172.16.10.3, port 5201
[  6] local 172.16.30.2 port 51615 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec  78.1 MBytes   655 Mbits/sec    0    208 KBytes       
[  6]   1.00-2.00   sec  79.7 MBytes   668 Mbits/sec    0    208 KBytes       
[  6]   2.00-3.00   sec  85.9 MBytes   720 Mbits/sec    0    208 KBytes       
[  6]   3.00-4.00   sec  76.0 MBytes   638 Mbits/sec    0    208 KBytes       
[  6]   4.00-5.00   sec  82.9 MBytes   696 Mbits/sec    0    208 KBytes       
[  6]   5.00-6.00   sec  77.4 MBytes   649 Mbits/sec    0    208 KBytes       
[  6]   6.00-7.00   sec  83.0 MBytes   696 Mbits/sec    0    208 KBytes       
[  6]   7.00-8.00   sec  84.3 MBytes   707 Mbits/sec    0    208 KBytes       
[  6]   8.00-9.00   sec  78.1 MBytes   655 Mbits/sec    0    208 KBytes       
[  6]   9.00-10.00  sec  83.4 MBytes   699 Mbits/sec    0    208 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec   809 MBytes   678 Mbits/sec    0             sender
[  6]   0.00-10.00  sec   809 MBytes   678 Mbits/sec                  receiverHostB to HostA:
		Code:
	
	hosta> iperf3 -c 172.16.10.3 -R
Connecting to host 172.16.10.3, port 5201
Reverse mode, remote host 172.16.10.3 is sending
[  6] local 172.16.30.2 port 43155 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-1.00   sec  39.8 MBytes   333 Mbits/sec                  
[  6]   1.00-2.00   sec  31.9 MBytes   268 Mbits/sec                  
[  6]   2.00-3.00   sec  34.5 MBytes   289 Mbits/sec                  
[  6]   3.00-4.00   sec  33.6 MBytes   282 Mbits/sec                  
[  6]   4.00-5.00   sec  32.8 MBytes   275 Mbits/sec                  
[  6]   5.00-6.00   sec  36.4 MBytes   305 Mbits/sec                  
[  6]   6.00-7.00   sec  33.4 MBytes   280 Mbits/sec                  
[  6]   7.00-8.00   sec  35.3 MBytes   296 Mbits/sec                  
[  6]   8.00-9.00   sec  34.6 MBytes   290 Mbits/sec                  
[  6]   9.00-10.00  sec  34.5 MBytes   290 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-10.00  sec   347 MBytes   291 Mbits/sec                  sender
[  6]   0.00-10.00  sec   347 MBytes   291 Mbits/sec                  receiverNow, if I change the configuration such that the connection between HostA and the Firewall is just a 1gbe pipe (on igb1), such that it looks like:
		Code:
	
	[ HostA (FreeBSD 12.2)                       ]
[     Intel Pro1000 | igb0 172.16.20.2/24    ]
              |      |
              | 1GBE |
              |      |
[     Intel Pro1000 | igb1 172.16.20.1/24    ]
[ Firewall (FreeBSD 12.2)                    ]
[     Intel Pro1000 | igb0 172.16.10.1/24    ]
              |      |
              | 1GBE |
              |      |
[     Intel Pro1000 | 172.16.10.3/24         ]
[ Host B (Windows 10)                        ]
		Code:
	
	hosta> iperf3 -c 172.16.10.3
Connecting to host 172.16.10.3, port 5201
[  6] local 172.16.20.2 port 48036 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec   113 MBytes   950 Mbits/sec    0    208 KBytes       
[  6]   1.00-2.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   2.00-3.00   sec   113 MBytes   950 Mbits/sec    0    208 KBytes       
[  6]   3.00-4.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   4.00-5.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   5.00-6.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   6.00-7.00   sec   113 MBytes   950 Mbits/sec    0    208 KBytes       
[  6]   7.00-8.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   8.00-9.00   sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
[  6]   9.00-10.00  sec   113 MBytes   949 Mbits/sec    0    208 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec  1.11 GBytes   949 Mbits/sec    0             sender
[  6]   0.00-10.00  sec  1.11 GBytes   949 Mbits/sec                  receiver
		Code:
	
	hosta> iperf3 -c 172.16.10.3 -R
Connecting to host 172.16.10.3, port 5201
Reverse mode, remote host 172.16.10.3 is sending
[  6] local 172.16.20.2 port 54198 connected to 172.16.10.3 port 5201
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-1.00   sec   112 MBytes   940 Mbits/sec                  
[  6]   1.00-2.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   2.00-3.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   3.00-4.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   4.00-5.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   5.00-6.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   6.00-7.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   7.00-8.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   8.00-9.00   sec   112 MBytes   942 Mbits/sec                  
[  6]   9.00-10.00  sec   112 MBytes   942 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  6]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec                  sender
[  6]   0.00-10.00  sec  1.10 GBytes   942 Mbits/sec                  receiverI have tried two different cards in the firewall, my Chelsio T440 as well as a 40GBE Intel XL710 (using a 4x10GBE breakout cable). I have tried tweaking various net.inet.tcp.* tunables as well as trying the cc_htcp congestion control algorithm, with basically zero difference from the above. I can post these if it would be helpful.
I'm hoping someone here has an idea of something else to try, as I've hit my wall. Given that this occurs across different hardware, I'm worried I'm somehow hitting the limit of the CPU/motherboard itself, but I just don't see how that's possible given that symmetric 1gb/sec is easily achieved over the igb interfaces. However I'd still appreciate it if someone with more experience could just tell me if what I want is impossible for some reason that I'm not understanding.
In the meantime I'll likely try swapping around the 10GBE cards and cables between HostA and the Firewall to see if it's just some failing hardware or something.
 
			     
 
		 
 
		 
 
		 
					
				 
 
		 
 
		