ZFS WARNING: Do not reboot when free space drops below 10 GB. ZFS suicide.

Hello,

this topic is merely a warning to others so you don't run into that problem because I doubt it will be fixed.

When you are trying to reboot from ZFS after your free space has dropped below 10 GB, ZFS refuses to boot and just give you the following error message:

ZFS: i/o error - all block copies unavailable

The only solution to this is to use a USB rescue stick, boot up the machine and delete enough snapshots so ZFS is in the mood again to let you boot. For me it happened when my free space drop below 10 GB, I deleted 50 GB of snaps and after a reboot it worked again without any problems.

This adventure left some deep dent in my belief that ZFS is a bullet proof and reliable FS and it also caused some big laughters from some non-ZFS users.
 
Picky or not, it is a critical thing everyone must have in mind, especially because you don't get any warnings when you just reboot and nearly everyone doesn't see any problems when one still has 10 GB of space left.
 
if you are making snapshots it is a good idea to auto purge old one's so it remains balanced and never good to let free get too low anyway.

With that said I applaud you for making the post because if someone else has the problem they can now google the solution.
 
Not something you want to have to sort out where availability matters.

Dunno if it's 10GB or some percentage (or similar), I recon it'll be the latter but for your setup that worked out to 10GB.

For your setup you could script: if space less than 15GB remove N snapshots and/or email you to be aware.
Not to be picky
Once you cross 1000 messages the choice won't be yours...
 
While I'm not doubting that this is what you saw, I don't believe that this is a generally true statement. I'm sure I've seen reboots with ZFS file systems that had nearly zero bytes free (bytes, not gigabytes). I suspect there is more complexity to this story.

It might help if you explain what the state of the system was when it was happy, and what the last thing you did before it went down.
 
Are you sure it was because of that ? I spawned the VM and did a test:

Code:
fbsd12:(~)# dd if=/dev/urandom of=/blob00 bs=1024k
dd: /blob00: No space left on device
9044+0 records in
9043+1 records out
9482665984 bytes transferred in 93.245012 secs (101696228 bytes/sec)
fbsd12:(~)#

fbsd12:(~)# zpool list rpool
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
rpool  17.5G  17.0G   560M        -         -    65%    96%  1.00x  ONLINE  -
fbsd12:(~)#

fbsd12:(~)# df -m /
Filesystem      1M-blocks Used Avail Capacity  Mounted on
rpool/ROOT/12.2      9472 9472     0   100%    /
fbsd12:(~)#

fbsd12:(~)# reboot
And VM came back just fine.
 
OP, more information is needed: does this failure mode only happen for (zfs) root file systems? If so, spell it out please.
 
There might be a very simple solution to the OP's issue (assuming I'm right about the issue at hand of course): the reservation property, see zfs(8). This setting can ensure available diskspace for filesystems which need this to ensure overal system functionality (think about /var/log and/or /var/spool).

For example:

Code:
$ zfs list
NAME                    USED  AVAIL  REFER  MOUNTPOINT
zroot                  36.8G   108G  13.5G  /
zroot/home              679M   108G   679M  /home
zroot/jails             217M   807M   217M  /opt/jails
zroot/swap             6.19G   108G  6.04G  -
zroot/tmp               214K   108G   214K  /tmp
zroot/var               452M   112G  43.1M  /var
zroot/var/db            408M   112G   358M  /var/db
Notice how zroot and other file systems all have access to 108 Gb worth of storage whereas /var has access to 112 Gb?

Code:
$ zfs get reservation zroot/var
NAME       PROPERTY     VALUE   SOURCE
zroot/var  reservation  5G      local
This setting ensures that if my system gets completely filled up then there will always be at least 5Gb of space available for /var, this ensures that logfiles and such can still be maintained. Which I'd say is crucial for me so that I can learn what caused the problem.
 
OP, more information is needed: does this failure mode only happen for (zfs) root file systems? If so, spell it out please.

I do not know how I could give you a satisfiable answer to your question except posting comprehensive information about the system, which caused the trouble:

FreeBSD 12.1-RELEASE-p9 GENERIC

# gpart list
Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 468862087
first: 40
entries: 128
scheme: GPT
Providers:
1. Name: ada0p1
Mediasize: 209715200 (200M)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 20480
Mode: r0w0e0
efimedia: HD(1,GPT,7460c9ec-5290-11ea-b2d2-f44d306e2eaf,0x28,0x64000)
rawuuid: 7460c9ec-5290-11ea-b2d2-f44d306e2eaf
rawtype: c12a7328-f81f-11d2-ba4b-00a0c93ec93b
label: efiboot0
length: 209715200
offset: 20480
type: efi
index: 1
end: 409639
start: 40
2. Name: ada0p2
Mediasize: 239846031360 (223G)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 210763776
Mode: r1w1e1
efimedia: HD(2,GPT,7466e1c1-5290-11ea-b2d2-f44d306e2eaf,0x64800,0x1bebf800)
rawuuid: 7466e1c1-5290-11ea-b2d2-f44d306e2eaf
rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
label: zfs0
length: 239846031360
offset: 210763776
type: freebsd-zfs
index: 2
end: 468860927
start: 411648
Consumers:
1. Name: ada0
Mediasize: 240057409536 (224G)
Sectorsize: 512
Mode: r1w1e2

Geom name: ada1
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 351651847
first: 40
entries: 128
scheme: GPT
Providers:
1. Name: ada1p1
Mediasize: 209715200 (200M)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 20480
Mode: r0w0e0
efimedia: HD(1,GPT,746fa745-5290-11ea-b2d2-f44d306e2eaf,0x28,0x64000)
rawuuid: 746fa745-5290-11ea-b2d2-f44d306e2eaf
rawtype: c12a7328-f81f-11d2-ba4b-00a0c93ec93b
label: efiboot1
length: 209715200
offset: 20480
type: efi
index: 1
end: 409639
start: 40
2. Name: ada1p2
Mediasize: 179834978304 (167G)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 210763776
Mode: r1w1e1
efimedia: HD(2,GPT,74759f89-5290-11ea-b2d2-f44d306e2eaf,0x64800,0x14ef8000)
rawuuid: 74759f89-5290-11ea-b2d2-f44d306e2eaf
rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
label: zfs1
length: 179834978304
offset: 210763776
type: freebsd-zfs
index: 2
end: 351651839
start: 411648
Consumers:
1. Name: ada1
Mediasize: 180045766656 (168G)
Sectorsize: 512
Mode: r1w1e2



# geli list
Geom name: ada0p2.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 256
Crypto: hardware
Version: 7
UsedKey: 0
Flags: BOOT, GELIBOOT
KeysAllocated: 56
KeysTotal: 56
Providers:
1. Name: ada0p2.eli
Mediasize: 239846027264 (223G)
Sectorsize: 4096
Mode: r1w1e1
Consumers:
1. Name: ada0p2
Mediasize: 239846031360 (223G)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 210763776
Mode: r1w1e1

Geom name: ada1p2.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 256
Crypto: hardware
Version: 7
UsedKey: 0
Flags: BOOT, GELIBOOT
KeysAllocated: 42
KeysTotal: 42
Providers:
1. Name: ada1p2.eli
Mediasize: 179834974208 (167G)
Sectorsize: 4096
Mode: r1w1e1
Consumers:
1. Name: ada1p2
Mediasize: 179834978304 (167G)
Sectorsize: 512
Stripesize: 0
Stripeoffset: 210763776
Mode: r1w1e1



# zpool list (of course after deleting the snaps)
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
zroot 167G 109G 58,1G - - 51% 65% 1.00x ONLINE -



# zfs list -t all (This shows around 1300 snaps. I doubt details this will matter here?)

NAME USED AVAIL REFER MOUNTPOINT
zroot 109G 52,9G 88K /zroot
zroot@2020_10_01 0 - 88K -
zroot@2020_10_10 0 - 88K -
zroot@2020_10_19 0 - 88K -
zroot@2020_10_20 0 - 88K -
zroot@2020_10_21 0 - 88K -
zroot@2020_10_22 0 - 88K -
zroot@2020_10_23 0 - 88K -
zroot@2020_10_24 0 - 88K -
zroot@2020_10_25 0 - 88K -
zroot@2020_10_26 0 - 88K -
zroot@2020_10_27 0 - 88K -
zroot@2020_10_28 0 - 88K -
zroot@2020_10_29 0 - 88K -
zroot@2020_10_30 0 - 88K -
zroot@2020_10_31 0 - 88K -
zroot@2020_11_01 0 - 88K -
zroot@2020_11_02 0 - 88K -
zroot@2020_11_03 0 - 88K -
zroot@2020_11_04 0 - 88K -
zroot@2020_11_05 0 - 88K -
zroot@2020_11_06 0 - 88K -
...



# zfs get -t filesystem reservation
NAME PROPERTY VALUE SOURCE
zroot reservation none default
zroot/ROOT reservation none default
zroot/ROOT/default reservation none default
zroot/tmp reservation none default
zroot/usr reservation none default
zroot/usr/home reservation none default
zroot/usr/ports reservation none default
zroot/usr/src reservation none default
zroot/var reservation none default
zroot/var/audit reservation none default
zroot/var/crash reservation none default
zroot/var/log reservation none default
zroot/var/mail reservation none default
zroot/var/tmp reservation none default


Of course it would be awesome if this was some behaviour, which can be changed so it won't happen again to anyone.
 
Back
Top