Solved zxfer (zfs send) fails with "out of space" error after upgrading to 13.1

ccammack · Aug 8, 2022

Sorry, let me test one more thing before I post this...

ccammack · Aug 9, 2022

I recently upgraded from 12.1 up to 13.1 and my backup process (using zfs-auto-snapshot + zxfer) has started failing with out of space errors.

Code:

Aug  8 06:26:49 salus root[43373]: Sending zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-01h00 to backup/venus/zroot/iocage/jails/minerva/root.
Aug  8 06:26:49 salus root[43373]: cannot receive new filesystem stream: out of space
Aug  8 06:26:49 salus root[43373]: Error when zfs send/receiving.

zxfer sends snapshots from the file server (venus/zroot) to the backup server (salus/backup). Similar snapshots transfer without error during the backup. There should be plenty of room in the backup destination pool. The pools look like this after the failure:

Code:

ccammack@venus:~ $ zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zroot  3.62T  1.72T  1.90T        -         -     0%    47%  1.00x    ONLINE  -

Code:

ccammack@salus:~ $ zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
backup  3.62T  1.71T  1.91T        -         -     0%    47%  1.00x    ONLINE  -
zroot    117G  18.6G  98.4G        -         -    22%    15%  1.00x    ONLINE  -

As I'm writing this, the root@zfs-auto-snap_hourly-2022-08-07-01h00 snapshot has already been deleted, so I don't know what it looked like, but the data doesn't change much, so the hourly snaps look mostly the same.

Code:

ccammack@venus:~ $ zfs list -t snapshot | grep -E 'minerva.+hourly'
zroot/iocage/jails/minerva@zfs-auto-snap_hourly-2022-08-08-08h00    0B      -      108K  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-09h00  340K      -     1.71T  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-10h00  340K      -     1.71T  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-07-11h00  340K      -     1.71T  -

Missing snapshots produce the error dataset does not exist during the transfer, so I don't think missing snapshots are mistakenly triggering the new out of space errors.

Code:

Aug  7 21:45:17 salus root[74987]: Sending zroot/var/audit@zfs-auto-snap_frequent-2022-08-07-21h30 to backup/salus/zroot/var/audit.
Aug  7 21:45:17 salus root[74987]:   (incremental to zroot/var/audit@zfs-auto-snap_hourly-2022-08-07-21h00.)
Aug  7 21:45:17 salus root[74987]: cannot open 'zroot/var/audit@zfs-auto-snap_frequent-2022-08-07-21h30': dataset does not exist
Aug  7 21:45:17 salus root[74987]: cannot receive: failed to read from stream
Aug  7 21:45:17 salus root[74987]: Error when zfs send/receiving.

After upgrading, I did have to manually change the zxfer script to ignore some of the new ZFS properties to get it to run. Both the source and destination pools are GELI-encrypted rather than native ZFS-encrypted.

Code:

$ diff /usr/local/sbin/zxfer.old /usr/local/sbin/zxfer
181c181
< userrefs"
---
> userrefs,objsetid,keylocation,keyformat,pbkdf2iters,special_small_blocks"

Is there anything that might have changed between 12.1 and 13.1 that could cause conflicts between zfs-auto-snapshot and zxfer that make zfs send | zfs receive think the destination is out of space? Maybe I misconfigured something after upgrading?

The obvious answer is that it's actually out of space, but I just don't see how.

covacat · Aug 9, 2022

run sh -x /path/to/zxfer args and will print actual commands to stderr
then debug the actual zfs command

ccammack · Aug 10, 2022

Okay, I disabled snapshots yesterday by renaming /usr/local/sbin/zfs-auto-snapshot to .old, let the backup run as normal and it completed without error. I have just re-enabled snapshots and have started the backup again to see what it looks like tomorrow. More info as I get it.

ccammack · Aug 11, 2022

Enabling snapshots broke it again.

Code:

Aug 11 03:06:49 salus root[91797]: Sending zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-09-01h00 to backup/venus/zroot/iocage/jails/minerva/root.
Aug 11 03:06:49 salus root[91797]: cannot receive new filesystem stream: out of space
Aug 11 03:06:49 salus root[91797]: Error when zfs send/receiving.

The problem snap is no longer present on the system.

Code:

ccammack@venus:~ $ zfs list -t snapshot | grep -E 'minerva.+hourly'
zroot/iocage/jails/minerva@zfs-auto-snap_hourly-2022-08-11-06h00
    0B      -      108K  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-09-04h00
  344K      -     1.71T  -
zroot/iocage/jails/minerva/root@zfs-auto-snap_hourly-2022-08-09-05h00
  336K      -     1.71T  -

It definitely thinks it's trying to back up something large. With snapshots off, the backup-sleep-backup loop ran 31 times over ~16 hours, but with snapshots on it only managed to run twice.

Code:

Aug 10 03:03:57 salus root[14341]: backup.sh: venus backed up 30/31 attempts
[...]
Aug 11 03:21:49 salus root[92141]: backup.sh: venus backed up 0/2 attempts

Weird. It worked well for 2 years but suddenly something is tripping it up. For now, I'll run snapshots on odd days and backups on even days until I have time to debug it properly. I'd appreciate hearing suggestions if anyone has any to offer.

Solved zxfer (zfs send) fails with "out of space" error after upgrading to 13.1

ccammack

ccammack

covacat

ccammack

ccammack