Solved zxfer (zfs send) fails with "out of space" error after upgrading to 13.1

I recently upgraded from 12.1 up to 13.1 and my backup process (using zfs-auto-snapshot + zxfer) has started failing with out of space errors.

Code:
Aug  8 06:26:49 salus root[43373]: Sending zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-01h00 to backup/venus/zroot/iocage/jails/minerva/root.
Aug  8 06:26:49 salus root[43373]: cannot receive new filesystem stream: out of space
Aug  8 06:26:49 salus root[43373]: Error when zfs send/receiving.

zxfer sends snapshots from the file server (venus/zroot) to the backup server (salus/backup). Similar snapshots transfer without error during the backup. There should be plenty of room in the backup destination pool. The pools look like this after the failure:

Code:
ccammack@venus:~ $ zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zroot  3.62T  1.72T  1.90T        -         -     0%    47%  1.00x    ONLINE  -

Code:
ccammack@salus:~ $ zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
backup  3.62T  1.71T  1.91T        -         -     0%    47%  1.00x    ONLINE  -
zroot    117G  18.6G  98.4G        -         -    22%    15%  1.00x    ONLINE  -

As I'm writing this, the root@zfs-auto-snap_hourly-2022-08-07-01h00 snapshot has already been deleted, so I don't know what it looked like, but the data doesn't change much, so the hourly snaps look mostly the same.

Code:
ccammack@venus:~ $ zfs list -t snapshot | grep -E 'minerva.+hourly'
zroot/iocage/jails/minerva@zfs-auto-snap_hourly-2022-08-08-08h00    0B      -      108K  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-09h00  340K      -     1.71T  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-10h00  340K      -     1.71T  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-11h00  340K      -     1.71T  -

Missing snapshots produce the error dataset does not exist during the transfer, so I don't think missing snapshots are mistakenly triggering the new out of space errors.

Code:
Aug  7 21:45:17 salus root[74987]: Sending zroot/var/audit@zfs-auto-snap_frequent-2022-08-07-21h30 to backup/salus/zroot/var/audit.
Aug  7 21:45:17 salus root[74987]:   (incremental to zroot/var/audit@zfs-auto-snap_hourly-2022-08-07-21h00.)
Aug  7 21:45:17 salus root[74987]: cannot open 'zroot/var/audit@zfs-auto-snap_frequent-2022-08-07-21h30': dataset does not exist
Aug  7 21:45:17 salus root[74987]: cannot receive: failed to read from stream
Aug  7 21:45:17 salus root[74987]: Error when zfs send/receiving.

After upgrading, I did have to manually change the zxfer script to ignore some of the new ZFS properties to get it to run. Both the source and destination pools are GELI-encrypted rather than native ZFS-encrypted.

Code:
$ diff /usr/local/sbin/zxfer.old /usr/local/sbin/zxfer
181c181
< userrefs"
---
> userrefs,objsetid,keylocation,keyformat,pbkdf2iters,special_small_blocks"

Is there anything that might have changed between 12.1 and 13.1 that could cause conflicts between zfs-auto-snapshot and zxfer that make zfs send | zfs receive think the destination is out of space? Maybe I misconfigured something after upgrading?

The obvious answer is that it's actually out of space, but I just don't see how.
 
Last edited:
Okay, I disabled snapshots yesterday by renaming /usr/local/sbin/zfs-auto-snapshot to .old, let the backup run as normal and it completed without error. I have just re-enabled snapshots and have started the backup again to see what it looks like tomorrow. More info as I get it.
 
Enabling snapshots broke it again.

Code:
Aug 11 03:06:49 salus root[91797]: Sending zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-09-01h00 to backup/venus/zroot/iocage/jails/minerva/root.
Aug 11 03:06:49 salus root[91797]: cannot receive new filesystem stream: out of space
Aug 11 03:06:49 salus root[91797]: Error when zfs send/receiving.

The problem snap is no longer present on the system.

Code:
ccammack@venus:~ $ zfs list -t snapshot | grep -E 'minerva.+hourly'
zroot/iocage/jails/minerva@zfs-auto-snap_hourly-2022-08-11-06h00
    0B      -      108K  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-09-04h00
  344K      -     1.71T  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-09-05h00
  336K      -     1.71T  -

It definitely thinks it's trying to back up something large. With snapshots off, the backup-sleep-backup loop ran 31 times over ~16 hours, but with snapshots on it only managed to run twice.

Code:
Aug 10 03:03:57 salus root[14341]: backup.sh: venus backed up 30/31 attempts
[...]
Aug 11 03:21:49 salus root[92141]: backup.sh: venus backed up 0/2 attempts

Weird. It worked well for 2 years but suddenly something is tripping it up. For now, I'll run snapshots on odd days and backups on even days until I have time to debug it properly. I'd appreciate hearing suggestions if anyone has any to offer.
 
Last edited:
Back
Top