ZFS ZFS Send / Recv halts after only few gigabytes

I'm trying to sync a rather large initial snapshot (16 Tbyte) and after 2 - 3 Gbyte transfer just halts. I cant really make up whats going on.
Initially i was using syncoid to manage this (other snapshots from other zfs volumes on same server work fine), but it always seems to halt around 2 - 3 GByte marker.

ssh sourcehostip zfs send -v zroot/www@autosnap_2025-11-08_initial | mbuffer -q -s 128k -m 1G | zfs receive -v rusty/backup/www

(both sides 14.3 release)

I think that's pretty simple and ought to work. I have used google, and tried a bunch of other things, over netcat, no compression, tweaking mbuffer and so on. But it kees halting and the root cause still is unclear.

Anybody have any tips for investigating this?
 
It seems related to congestion control settings

tcp_bbr_load="YES"
net.inet.tcp.functions_default=bbr

If i disable these, zfs does not halt at 2-3 gbyte and continues as expected, but very low transfer speeds, 4-20 Mbyte/sec instead of full gigabit pipe.

I'd like the see the full gigabit pipe used, i dont want to wait 2 weeks for the initial snapshot to be received.
 
Which network interface controller is used for the traffic? Please show us pciconf -lv | grep -B3 network and ifconfig.
The sender has a
igb0@pci0:1:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x1028 subdevice=0x06e2
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet

the reciever i tried 3 different machines one with a MT27710 Family [ConnectX-4 Lx] and other Ethernet Controller I225-V
 
we had a similar problem and became very confused until we realized that we were zfs send-ing between pools that had different -o compatibility settings. make sure the sending and receiving systems have the same featuresets enabled.
 
we had a similar problem and became very confused until we realized that we were zfs send-ing between pools that had different -o compatibility settings. make sure the sending and receiving systems have the same featuresets enabled.
I doubt that's related because also rsync halts after a while when using BBR
 
when i limit the sync bandwidth to 30 Mbyte/sec it does not halt it seems for now, at 40 Mbyte/sec it does (when done 2-3 Gbyte). Something gets flooded, overrun? Unsure what to investigate or settings to try
 
iperf shows similar behaviour and performance. when BBR is configured, starts fast, full gigabit, but at 4th second grinds to 0 bytes/sec, default settings of freebsd it is unstable between 10 - 200 mbit

also my previous statement 30 Mbyte/sec is quite stable, but as soon as there is some other transfer (total traffic exceeds 30 Mbyte/sec) it affects the running zfs send/receive and makes it go 0 bytes too.
 
  • Do you experience that issue just for zfs send/recv?
  • What happens if you transfer 100 GB using scp ?
  • Do you lose complete network access on the sender/receiver site, so SSH hangs as well ? Or is it just that single connection ?
 
FWIW (it was not the NIC) and maybe useful for another person running into similar performance issues. The ISP sold the server as 1 gbit unmetered etc, turns out it was 300 mbit capped. And BBR did not like the way they do traffic shaping apparently. Very frustrating, especially since spend quite some time investigating this.
 
Back
Top