Poor throughput with FreeBSD 13.2 in VMware VM using VMXNET3

Hi all,

I am using opnsense which is based on FreeBSD13.2 at the moment in a VMware VM using VMXNET3 adapters.

I cant get over 2.7Gbit/s with iperf3 or routed traffic. This is clearly a FreeBSD problem as far as I can see, because its possible to easily reach nearly 10Gbit/s between physical workstations and VMs with other guest OS.

Are there any solutions for this problem ?
 
Thanks. This is not an derivate specific topic. It is a FreeBSD topic. The vmxnet3 problem seems to be located in the kernel or in the specific driver.

I noticed and understand the rest of the points listed in you links.
 
Install plain FreeBSD then check if you have the same problem. If not, then it's not FreeBSD specific.
 
Maybe, but then iperf throughput should get better with more CPUs or parallel iperf3 streams. But i cant get over 2,7Gbit/s with both.

And routing throughput should not be affected.
 
Install plain FreeBSD then check if you have the same problem. If not, then it's not FreeBSD specific.

Thanks for this hint. I tested it with plain 13.2 install and I get 6-7Gbit/s with plain FreeBSD.


In this case, I will switch to the OPNsense community again. :)
 
Yeah, keep in mind that although OPNSense is based on FreeBSD they have quite a bit of customization and changes.
 
I would like to keep this topic warm. The reason is, that I did a lot of testing. The detailed results are described here:


Its clear that OPNsense has very poor throughput in a VMware VM with vmxnet3 adapter. But plain FreeBSD 13.2 throughput is also not as good as could be.

This is the restulf for the fresh plain FreeBSD 13.2 installation:

I installed a fresh plain FreeBSD 13.2 with iperf3 with same VM settings like the matured OPNsense (8 vCPUs, 8GB ram, vmxnet3 adapter), same ESXi-host. The result is:

~5Gbit/s from client to FreeBSD, 6Gbit/s in reverse mode, both with only one iperf3 stream,
~8,5Gbit/s from client to FreeBSD, 7,3Gbit/s in reverse mode, both with two parallel iperf3 streams,
~9,2Gbit/s from client to FreeBSD, 8Gbit/s in reverse mode, both with three parallel iperf3 streams,

With three streams, utilization of the VM is at 42% from client to FreeBSD and only 18% in reverse mode. Dont know if this unequal utilization is a iperf3 or FreeBSD issue.


Compared with Ubuntu and Windows VMs, this single stream performance is not very good. I can easily see 9-10G with a Ubuntu or Linux VM on the same hardware.

And I dont understand, why CPU utilization is 42% in the one and 18% in the other direction.
 
IIRC ESXi has LRO and TSO disabled by default.
If you aren't bound to ESXi, I'd recommend switching the host to something that supports the bhyve hypervisor - i.e. FreeBSD or SmartOS (illumos based). This is a much more modern hypervisor without layers and layers of decades-old cruft. Also virtio (i.e. vnet) should always be the first choice before other (proprietary) virtualized devices/drivers, as it offers by far the best performance.

As for iperf: the send stream requires the generation of data; try using pre-created test data e.g. by dumping /dev/random into a file of 10 or more GB in size and use that for iperf.

Another caveat might be MTU and fragmentation - for >1GBit links I always use jumbo frames on the host and switches, and only set lower MTU within VMs/jails if required/sensible.

Regarding switches: what switch are you using? This can also have a huge impact on actual throughput; e.g. with mikrotik I wasn't able to get anywhere near line speed especially with VLANs - 7-8GBps was the absolute max for a *single* stream over the switch in a single direction. Multiple streams completely wrecked performance for all streams and caused a lot of packet loss. I tried to solve this with the (practically non-existent) mikrotik support, with the dumb excuse of "we can confirm this, but don't know how to solve it"...
The stacks of Catalyst 3750X here and at my homelab can handle full line speed even with PBR.
 
Changing the Hypervisor could possibly solve this issue, but its not an option for me, because the products you mentioned are also far away from being perfect. And because I am VMware VMUG member and I am professionally using VMware products.

Trying iperf with predumbed files is worth a try, but this should not be the reason for such a poor performance.

Jumbo frames are enabled on my physical switch, the VMware vSwitch and on some server VMs (for iSCSI and NFS networks). But enabling jumbo frames in client systems is always a bad idea. Its a workaround if you have slow hardware or if you are using non optimized OSs and drivers. Come on, we live in 2023 and every modern processor can deal with 10G non firewalled throughput.

If a OS and drivers are optimized, there is no need for jumbo frames. Todays hardware is fast enough to deal with high throughput and default MTU. This is the result I have made with MTU 1500 today between my windows workstation and VMs with the following OS installed (single flow connections only from windows workstation to VM):

Fresh FreeBSD 13.2 install: 5Gbit/s
Matured OPNsense 23.7.4 installation: 2,7Gbit/s (based on FreeBSD13.2)
Fresh OPNsense 23.7.4 installation with pf enabled and a allow all rule: 2,8Gbit/s
Fresh OPNsense 23.7.4 installation with deactivated pf firewall: 3,2Gbit/s
Ubuntu 22.04 LTS: over 9Gbit/s
OmniOS: 3,5Gbit/s

You can clearly see that its not a hypervisor topic. The ESXi is fast enough and Ubuntu can nearly reach line speed (would be the same with Windows VM). The reason why all other OSs are slow: They lack of OS and driver optimization. For instance: https://www.illumos.org/issues/15907

I am using a CISCO CBS350-8MGP-2X. My home is cabled with CAT7 and I cant see any frame or packet retransmisions or drops on the switch or in the hypervisor. You can also clearly see (Ubuntu) that the network is not the reason. Line speed is possible here.
 
Just to let you know:

OPNsense deactivates the complete NIC hardware acceleration in a default installation. A user pointed out to me that I'd sould try to turn on hardware acceleration. And this lifted the throughput to the same level like a fresh FreeBSD 13.2 installation. This is a big advantage over the previous throughput.

Now, we can discuss on the plain FreeBDS topic agian, because I am missing four to five more Gbit/s on a single stream. ;-)

I will do some more tests in the next couble of days.
 
Back
Top