FreeBSD 14.3 throughput tanking if a Linux Client or Linux Router is involved.

This has been a really difficult case for me to solve, but what I am seeing sounds ... too weird to believe. I really don't know where the fault lies, but any advice would be amazing.

What I see is that when a Linux router is on the path between a FreeBSD server or client, a collapse in bandwidth occurs. I am also seeing situations where Linux as a client performs worse when connected to a FreeBSD server.

But more over, the moment a linux router is involved, the throughput tanks *hard*.

The reason I noticed this is that I have a FreeBSD server in a VPS. Downloading from the FreeBSD machine to another machine in the same DC, I see 480MB/s. From FreeBSD via my internet to my UXG I am able to see 30MB/s - this is maxing out my line rate, and is what I expect.

But from the FreeBSD machine to *anything* behind the UXG (a linux router) I see 5MB/s, and it fluctuates wildly.

On top of this, others who consumer the services of that FreeBSD server reported the same throughput tanking - ubiquiti, tplink, openwrt - all seemed to be affected.

I've spent a lot of time trying out various tunings - the worst was enable RACK which caused these values to sink into 200kb/s kind of numbers. htcp instead of cubic improves things somewhat but not much. Buffer tunigs etc don't seem to have much effect.


To isolate this I have tried to break down a reproducer to a few components as possible. This will be a long post, but I'd rather be thorough than not.

The reproduction environment is as follows:

I am using virtual machines hosted on a FreeBSD 14.3 server which has an Intel Xeon Silver 4310 (24) @ 2.100GHz and 128GB of ram.

The network adapter is an Intel ix 10GBe adapter, which has LRO/TSO disabled. Ifconfig output below.

Code:
ix0: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    options=4e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
    ether 64:9d:99:b1:a4:78
    media: Ethernet autoselect (10Gbase-SR <full-duplex,rxpause,txpause>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

On top of this are vlan interfaces, that then are connected to bridges. As an example:

Code:
ix0.20: flags=1008943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    options=4200001<RXCSUM,RXCSUM_IPV6,MEXTPG>
    ether 64:9d:99:b1:a4:78
    groups: vlan
    vlan: 20 vlanproto: 802.1q vlanpcp: 0 parent interface: ix0
    media: Ethernet autoselect (10Gbase-SR <full-duplex,rxpause,txpause>)
    status: active
    nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

bridge20: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
    options=0
    ether 58:9c:fc:00:31:42
    id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    member: tap15 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 33 priority 128 path cost 2000000
    ...
    member: ix0.20 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
            ifmaxaddr 0 port 15 priority 128 path cost 2000
    groups: bridge vm-switch viid-8286e@
    nd6 options=9<PERFORMNUD,IFDISABLED>

I am using bhyve + vm-bhyve to manage the machines. Each machine has 2 cpu and 2GB of ram.

All machines are completely stock - no sysctl tuning or anything like that.

In the examples I will show iperf3 output - keep in mind I am also testing with curl + webservers, and the results are near identical. For brevity as this will already be very long, I will only show the iperf3 results as they align to other tests performed.

As a baseline, I establish localhost throughputs:

Code:
SERVER       to  CLIENT       : THROUGHPUT
OpenSUSE 15.6 to SELF         : 53.6 Gbits/sec
FreeBSD 14.3  to SELF         : 33.9 Gbits/sec
:

These values set the "upper limit". I would assume that even though a discrepancy exists here, 10GBit is achievable to both instances. The software is *NOT* the bottleneck.

Within the same bridge interface I see the following throughput:

Code:
SERVER       to  CLIENT       : THROUGHPUT
OpenSUSE 15.6 to OpenSUSE 15.6: 3.71 Gbits/sec
OpenSUSE 15.6 to FreeBSD 14.3:  2.80 Gbits/sec
FreeBSD 14.3 to FreeBSD 14.3:   1.73 Gbits/sec
FreeBSD 14.3 to OpenSUSE 15.6:   964 Mbits/sec

We can already see a stark difference here: from a FreeBSD server to a Linux client, there is a significant reduction in throughput even when a router is not involved.

Now, to establish the bandwidth between the router and the clients.

The router is a ubiquiti UXG PRO 10GBe connected with 10GBe DAC's to a Ubiquiti Aggregation switch with 10GB SFP's.

Since the path is from the server, via the switch, to the router and then the same return path, there should be no other variables.

These numbers now are *crossing* that router.

Code:
SERVER       to  CLIENT       : THROUGHPUT
OpenSUSE 15.6 to OpenSUSE 15.6: 1.40 Gbits/sec
OpenSUSE 15.6 to FreeBSD 14.3:   706 Mbits/sec
FreeBSD 14.3 to FreeBSD 14.3:    304 Mbits/sec
FreeBSD 14.3 to OpenSUSE 15.6:   534 Mbits/sec

So we can see that the "upper" limit is 1.4GBit, but every other combination yields far lower results. The worst being FreeBSD to FreeBSD when crossing the UXG.

There seems to be something fundamental at play here - even in the direct L2 tests, FreeBSD as a server to an OpenSUSE client has nearly 50% of the throughput of any other combination. FreeBSD to FreeBSD also seems to be experiencing some kind of issue.

Again, all the values above are *stock* installs with no tunings.

The most significant evidence I have found is that when wiresharking the connections, from linux to linux, I see a collection of 10 to 16 packets, followed by a single ACK, rinse repeat. Where as in FreeBSD it will received 10 packets and send 3 to 4 acks, and then the connection devolves into single data frame - single ACK.

So there is some interaction here that I do not understand.

What should I look at next? What should I test? Is this a bug?

Any advice would be really appreciated.
 
Last edited by a moderator:
What I see is that when a Linux router is on the path between a FreeBSD server or client, a collapse in bandwidth occurs. I am also seeing situations where Linux as a client performs worse when connected to a FreeBSD server.
I've no experience with a configuration as complex as yours, but I can say that this isn't the normal behaviour in simple set-ups. I just tried - in my homelab - interposing a Linux (bookworm) router between a FreeBSD 15.0-RELEASE box [edit: and another bookworm box] and it made no difference to iPerf3 performance.

This is all cheap 1Gbps h/w - Netgear switch, Raspberry Pi 4.

I see about 750Mbp/s between FreeBSD and Linux whether or not a second Linux system is in between them. I see about 950Mb/s between the two Linux systems. That ~ 20% difference has been normal in my experience.

The only source I know of for complex routing set-ups is here in the wiki - and you've already turned off LRO and TSO.
 
Always keep in mind that iperf(3) is heavily CPU-restricted (and single-threaded) and for anything above 1Gbps you might run into performance limits not related to the network stack.

From my experience, if you see degraded throughput from/to a host that may not be present if pushing packets *through* that host (or vice-versa), MTU mismatches are often (part of) the cause. This is especially true if VLANs and/or tunnel interfaces are involved. FreeBSD usually does the 'right thing' and just works without touching anything - but I've also had to deal with mikrotik gear, which is completely inconsistent when it comes to adjusting the MTU/frame sizes and even switched behaviour randomly at runtime (i.e. after a few days we saw a lot of packet loss in one direction due to the MTU suddenly dropping and path discovery being completely broken on mikrotik...)
 
re cpu - this is why I did the localhost tests to establish upper bounds. When testing inside the same bridge, or between vlans, I see ~20% CPU loading. So I am happy to eliminate CPU bottleneck as a cause as the hypervisor is also lightly loaded.

It's not jumbo frames - all interfaces on every vm are mtu 1500, as as the switch interfaces, hypervisor interfaces, and router interfaces. So unless there is some interaction I'm missing between bridges and vms, I'm happy to rule this out too.

This is why I'm so stumped, all the "obvious" issues seem to not be a factor here, but there is clearly some interaction here that I don't understand.
 
I'm seeing the same thing or something similar (but my current issue is FreeBSD 13.5 sending a large backup file to FreeBSD 14.3).

First situation was copying backup files from FreeBSD 14.3 VM to OpenBSD. Turning off LRO/TSO on the FreeBSD VM seemed to resolve it. The main change in terms of infrastructure was moving the backup machine behind a (Linux) PPPoE router (not one that I control.). Before I moved the machines there were no performance concerns.

But now hitting an issue where a FreeBSD server (13.5) on real hardware was moved from one rack to another, and copying a backup dump file is taking a lot longer. Tweaking LRO/TSO made no difference. Gone from 90 minutes to eight-and-a-half 1/2 hours. No OpenBSD in the mix this time, FreeBSD server to FreeBSD server. I've got to chase up the hosting company to see what exactly happened during the rack move to get more information on what might have changed in terms of networking.

Unfortunately I was upgrading from 13.x to 14.x around the same sort of time (on the backup machine), so more than one moving part here. But going by file time stamps, the drop happened when the server was moved, so I suspect the new rack it was moved to has something different in the network mix.

I have only just started trying to figure out what is going wrong, so my wording is hopelessly vague, but your situation is definitely ringing a bell.
 
It seems that whatever the issue is, it's specific to FreeBSD 14.X

I did a test using 15.0-RELEASE machines. Again, all of these are fresh installs with no tunings.

localhost throughputs:

Code:
SERVER       to  CLIENT       : THROUGHPUT
OpenSUSE 15.6 to SELF         : 53.6 Gbits/sec
FreeBSD 14.3  to SELF         : 33.9 Gbits/sec
--
FreeBSD 15.0  to SELF         : 35.8 Gbits/sec
:

Within the same bridge interface I see the following throughput:

Code:
SERVER       to  CLIENT       : THROUGHPUT
OpenSUSE 15.6 to OpenSUSE 15.6: 3.71 Gbits/sec
OpenSUSE 15.6 to FreeBSD 14.3:  2.80 Gbits/sec
FreeBSD 14.3 to FreeBSD 14.3:   1.73 Gbits/sec
FreeBSD 14.3 to OpenSUSE 15.6:   964 Mbits/sec
--
FreeBSD 15.0 to FreeBSD 15.0:   2.68 Gbits/sec
FreeBSD 15.0 to OpenSUSE 15.6:  4.83 Gbits/sec

Crossing router

Code:
SERVER       to  CLIENT       : THROUGHPUT
OpenSUSE 15.6 to OpenSUSE 15.6: 1.40 Gbits/sec
OpenSUSE 15.6 to FreeBSD 14.3:   706 Mbits/sec
FreeBSD 14.3 to FreeBSD 14.3:    304 Mbits/sec
FreeBSD 14.3 to OpenSUSE 15.6:   534 Mbits/sec
--
FreeBSD 15.0 to FreeBSD 15.0:    538 Mbits/sec
FreeBSD 15.0 to OpenSUSE 15.6:  1.59 Gbits/sec


Just be aware of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=289769 before upgrading - I think for now I plan to wait before upgrading, but good to know that this will be resolved through an update :)
 
It seems that whatever the issue is, it's specific to FreeBSD 14.X
OK, my issue is different, then. Two FreeBSD 13.5 installs in two different data centres, copying the same file to a third FreeBSD machine (14.3) in a third data centre. The machine that wasn't moved takes 0.7 seconds to copy the file, the machine that was moved and now is a lot slower to copy (so something network-related changed at the DC) takes 5.9 seconds to copy the same file. Copying between the two FreeBSD 13.5 installs takes 0.4 seconds. So something new inserted between the now-slow machine and the third DC, that somehow causes an issue at the third DC. But seems all unrelated to what you are reporting, so I'll finish here (but will post if I manage to find an answer/resolution).

EDIT1: upgrading the performing-OK machine from 13.5 to 14.3, no issues, the file copy takes 0.7 seconds.
 
1765434402098.png

1765434743783.png
 
Those throughput values are all terrible, including the Linux ones :mad:
This.
That's why I'd check for a more basic, underlying problem...

Yet again - iperf3 is basically just a single-core-CPU-benchmark nowadays. You *might* get away to saturate a 10G link with it, but anything above it's just worthless... Especially on newer CPUs with their cripple-core architecture where that iperf3 process might end up on one of the 'cost-efficient-cores'. (cost-efficient for the vendor that is...)

I think this could be a jumbo frames issue. All the devices have to be able to accept the same size mtu end to end.
You can usually just enable jumbo frames on all switches, regardless of what the devices on the network are capable of. Especially if you do routing on your switches, just let that run on max MTU. They will handle fragmentation automagically and proper switches do it on the ASIC, so there's zero penalty for fragmentation/reassembly.
I usually set interfaces where only 'real' systems talk to each other to the max MTU (9198) and interfaces that also talk to e.g. windows boxes or other 'consumer gear' stay at the default (1500) or is specifically set to the MSS the client can handle.
Of course that's only on 10G and above interfaces- there is pretty much no advantage in large MTUs on slow 1Gbit links; so I don't bother changing anything here (except for links with lower MTU and/or where gear with broken path MTU discovery is involved)

firstyear what NIC are you using exactly? Any chance this is some off-brand NIC with their own firmware? I've seen various oddities with those over the years... Disabling various offloading (RX/TXCSUMs, LRO/TSO) might help with broken firmware.
You also mentioned DACs - any chance you can test with either AOCs or proper transceivers?
Also check the interface stats. Especially the input/output rates, throttles, drops and collisions are of interest, but just post the full output of whatever the equivalent of a "show interface N" is on ubiquiti (if there are such stats at all?)
 
That's why I'd check for a more basic, underlying problem...
It's not just that they are bad, but the *degradation* is so steep.

As mentioned I'm not just testing iperf3 - I'm testing with webservices/scp/samba. It's just all of them are showing similar numbers so for brevity I only posted the iperf3 throughput values. I'm not so silly to only test a single thing, I know benchmarks are just that - benchmarks.

firstyear what NIC are you using exactly?
Code:
ix0@pci0:81:0:0:    class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10fb subvendor=0x8086 subdevice=0x0006
    vendor     = 'Intel Corporation'
    device     = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
    class      = network
    subclass   = ethernet

No weird off brand / custom firmware, came with legit intel packaging etc.


Anyway, further testing today showed that using bbr instead of freebsd/rack can help a lot in some conditions - notably an internet FreeBSD server that traverses a linux router. Similar FreeBSD 15.0 still helps a lot, especially with bbr.
 
Anyway, further testing today showed that using bbr instead of freebsd/rack can help a lot in some conditions - notably an internet FreeBSD server that traverses a linux router. Similar FreeBSD 15.0 still helps a lot, especially with bbr.
All of that should only influence throughput in nuances. You should still consistently get >90% of max throughput regardles of the congestion control used.

As mentioned I'm not just testing iperf3 - I'm testing with webservices/scp/samba.
Higher protocols aren't suitable to benchmark/test for fundamental problems. smb is particularly bad at pretty much everything, even at lower link speeds and should never be considered a "benchmark" (or even used at all if it can be avoided...)
To actually rule out iperf3s limitations, try using multiple parallel streams (-P). This is pretty much mandatory for links above 1G to get realistic results. From ~8 streams upwards you should see consistently >9Gbits/s without much deviation, i.e. only by 1-2 figures at the second decimal place.

If you still see those abysmal figures (i.e anything <7-8Gbps) with multiple streams:
Can you get interface statistics from that ubiquiti switch and are there any errors/warnings in its logs?
Are throughput nubers from/to other systems also that bad? If yes, check the load on the switch; if all connections are rather bad, check for broadcast storms that might drag down the switch.

Again: even the nubers you get between the "good" systems are abysmally bad for 10G links.
 
All of that should only influence throughput in nuances. You should still consistently get >90% of max throughput regardles of the congestion control used.
And yet, the evidence shows otherwise ... :(

Higher protocols aren't suitable to benchmark/test for fundamental problems. smb is particularly bad at pretty much everything, even at lower link speeds and should never be considered a "benchmark" (or even used at all if it can be avoided...)
Sure, but these higher protocols are also what is suffering and impacting workloads.

Again: even the nubers you get between the "good" systems are abysmally bad for 10G links.
The "good" system numbers are not even leaving the hypervisor - it's a 14.3 FreeBSD bhyve vm with a bridge. So the "good" numbers being so bad indicates something else going on in the FreeBSD machine I would think .... Again why I'm so confused about why those numbers are so bad, it really shouldn't be.

I'll do the -P test with iperf3 and see how it goes.

The only time the traffic is crossing the ubiquiti switch/router is the "bad" tests. Other devices / tests don't show any degradation. It's why I did the tests the way I did, to try and isolate as many factors from the equation as possible, and even the fact the "good" numbers are so poor really points to something in the hypervisor / guests I would think since the traffic never leaves the machine.
 
Back
Top