WireGuard site-to-site performance really slow

TLDR; The performance of traffic through wg tunnel is about 40x slower than baseline internet. Why?

Our setup seems standard to me. Two test sites (Oregon & California). Each site has a modest FreeBSD Bhyve host running a dozen or so VMs of various flavors. The bhyve configuration is using if_bridge and tap interfaces. Upstream is phy link is Intel I350 quad gig-E, using igb device. VMs receive a VirtIO interface, vtnet device.

Recently, I added a new VM at each site to create a site-to-site WireGuard link. VPN is up, with test hosts at each site are routing both IPv4+6 through the new tunnel. Pings and traceroutes (4+6) all working as expected with latency typically 20ms through the tunnel, just about 1ms slower than outside the tunnel. pf is installed inside the VM and used to control access between the sites. The goal is to possibly replace L2TP/IPSec tunnels between sites that currently burden network appliances.

Both sites have 1G/1G network offerings (different providers). Direct plugged tests into those ISPs do get good performance in the local region. Running transfers intra-VMs (through the taps and bridge) will sustain rates around 300-400Mbps (curl->bridge->https->bridge->http ). Site to site transfer (California to Oregon) tests directly over the internet today baseline about 131Mbps. My blunt test is to curl a 500MB file of random bytes from an Webserver VM over HTTPS.

When I run these same transfer tests through the WireGuard tunnel, the transfer rate is an abysmal at 3.5Mbps. Transfers over IPv6 have the same performance characteristics as the IPv4 transfers. Most of the prior discussions about wireguard performance I've read seem related to fragmentation. I've tuned my MTU up and down and performance seems uniform and link stable with an MTU of 1420. There seems to be plenty of compute capacity on the hosts, none of the CPU cores max out when testing through the tunnel.

I'm looking for suggestions.

Some of my suspections:
  • I'm overlooking something obvious
  • Maybe the pf filter is mangling performance of the tunnel
  • Maybe netgraph would do better than if_bridge & tap
  • Maybe WireGuard isn't great on FreeBSD and I should try with Linux guest VMs
  • Maybe the hairpin turns running wg host and workload hosts on the same hardware isn't ideal, interrupt timing, etc.
  • My tests using random bytes is cruel and unusual, one public site using files full of null bytes was way faster
  • Maybe my network isn't coping with UDP WireGuard packets well, baseline tests were direct TCP transfers
 
Seemed like a reasonable idea. I read up some on PMTUD. Also tried my tests with all firewall filter rules on the wg host disabled, and it seemed to make no difference.

I ran packet captures on all interfaces of both wireguard hosts during a test transfer. It doesn't seem like there are any PMTUD packets getting blocked, in fact no ICMP at all during my capture period. There is ICMPv6 being passed, and seems fine.

However, I did notice on the receiving end, where I'm running curl to perform a download, there are a on out of order TCP frames and duplicate ACKs. This is on the wg0 interface on the receiving end. Nothing looks off on the other captured interfaces, although I don't expect anything would really show in a Wireshark analysis of the opaque Wireguard UDP packets.

Any ideas or suggestions to determine where this traffic is getting put through such a meat grinder?
 

Attachments

  • Bad_flow.png
    Bad_flow.png
    509.9 KB · Views: 12
Is this on FreeBSD 14.4 or 15.0 ?

Before anything else I would first plan for network hickups that might affect vm workloads. Then confirm outofband (ipmi) access in case the network card ends up in a weird state. It can happen. After that I would get a baseline with iperf3.

I understand the new improved bridge is more than capable to provide what you need. Some things to experiment with before considering netgraph.

Some might cause interface flapping or might require full reboot.
- interface flags to consider, if they apply to your case (-rxcsum -txcsum -lro -tso -tso6 -vlanhwtso -vlanhwcsum -vlanhwtag)
- vtnet flags (hw.vtnet.tso_disable="1" hw.vtnet.lro_disable="1"hw.vtnet.csum_disable="1)
- change congestion algorithm (tcp_bbr_load="YES")
- maybe just for funzies do also vnet jails wireguard instead of bhyve vms just to get some test data (RAM savings)
- also a quick debian vm to get more test data (net.ipv4.tcp_congestion_control = bbr, net.ipv4.ip_forward = 1 , and appropriate offload-tx|sg|tso off for the interface)
 
I have this problem with my Wireguard personal VPN. In my case, I resolve the issue using TCP BBR Congestion Control (enabled in
loader.conf.local using:

tcp_bbr_load="YES"

And sysctl

net.inet.tcp.functions_default=bbr

With these changes I have full bandwidth in my connection (nearly 400 Mbps), before change, I barely obtain 120-150 Mbps.
 
Back
Top