ZFS ZFS Send / Recv halts after only few gigabytes

iRobbery · Nov 8, 2025

I'm trying to sync a rather large initial snapshot (16 Tbyte) and after 2 - 3 Gbyte transfer just halts. I cant really make up whats going on.
Initially i was using syncoid to manage this (other snapshots from other zfs volumes on same server work fine), but it always seems to halt around 2 - 3 GByte marker.

ssh sourcehostip zfs send -v zroot/www@autosnap_2025-11-08_initial | mbuffer -q -s 128k -m 1G | zfs receive -v rusty/backup/www

(both sides 14.3 release)

I think that's pretty simple and ought to work. I have used google, and tried a bunch of other things, over netcat, no compression, tweaking mbuffer and so on. But it kees halting and the root cause still is unclear.

Anybody have any tips for investigating this?

iRobbery · Nov 8, 2025

It seems related to congestion control settings

tcp_bbr_load="YES"
net.inet.tcp.functions_default=bbr

If i disable these, zfs does not halt at 2-3 gbyte and continues as expected, but very low transfer speeds, 4-20 Mbyte/sec instead of full gigabit pipe.

I'd like the see the full gigabit pipe used, i dont want to wait 2 weeks for the initial snapshot to be received.

cracauer@ · Nov 8, 2025

I'm not sure the mbuffer on the receiving side is helpful.

T-Daemon · Nov 8, 2025

Which network interface controller is used for the traffic? Please show us pciconf -lv | grep -B3 network and ifconfig.

iRobbery · Nov 8, 2025

T-Daemon said:
Which network interface controller is used for the traffic? Please show us pciconf -lv | grep -B3 network and ifconfig.

The sender has a
igb0@pci0:1:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x1028 subdevice=0x06e2
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet

the reciever i tried 3 different machines one with a MT27710 Family [ConnectX-4 Lx] and other Ethernet Controller I225-V

iRobbery · Nov 8, 2025

cracauer@ said:
I'm not sure the mbuffer on the receiving side is helpful.

That makes no different unfortunately, also different mbuffer settings still lead to same result

iRobbery · Nov 8, 2025

I think this thread isnt in the right place, doing a rsync on a longer file leads to the same halting of network traffic.

atax1a · Nov 8, 2025

we had a similar problem and became very confused until we realized that we were zfs send-ing between pools that had different -o compatibility settings. make sure the sending and receiving systems have the same featuresets enabled.

iRobbery · Nov 8, 2025

atax1a said:
we had a similar problem and became very confused until we realized that we were zfs send-ing between pools that had different -o compatibility settings. make sure the sending and receiving systems have the same featuresets enabled.

I doubt that's related because also rsync halts after a while when using BBR

iRobbery · Nov 8, 2025

when i limit the sync bandwidth to 30 Mbyte/sec it does not halt it seems for now, at 40 Mbyte/sec it does (when done 2-3 Gbyte). Something gets flooded, overrun? Unsure what to investigate or settings to try

cracauer@ · Nov 9, 2025

What happens when you run a plain tcp benchmark between these machines?

iRobbery · Nov 9, 2025

iperf shows similar behaviour and performance. when BBR is configured, starts fast, full gigabit, but at 4th second grinds to 0 bytes/sec, default settings of freebsd it is unstable between 10 - 200 mbit

also my previous statement 30 Mbyte/sec is quite stable, but as soon as there is some other transfer (total traffic exceeds 30 Mbyte/sec) it affects the running zfs send/receive and makes it go 0 bytes too.

tanis · Nov 9, 2025

Do you experience that issue just for zfs send/recv?
What happens if you transfer 100 GB using scp ?
Do you lose complete network access on the sender/receiver site, so SSH hangs as well ? Or is it just that single connection ?

iRobbery · Nov 9, 2025

narrowing it down i think, seems a faulty network card is/was the issue

iRobbery · Nov 13, 2025

FWIW (it was not the NIC) and maybe useful for another person running into similar performance issues. The ISP sold the server as 1 gbit unmetered etc, turns out it was 300 mbit capped. And BBR did not like the way they do traffic shaping apparently. Very frustrating, especially since spend quite some time investigating this.