First, a quick warning to beadm users:
If you have

How did I get here:
My FreeBSD system uses UEFI, I boot from ZFS and I use
Last week I updated to FreeBSD 14.0-RELEASE via
I
... until today when I discovered that my teamspeak server was quietly crashing, without a core dump or error message.
I have been using beadm for years, so the solution was obvious, roll back, then look for answers.
So, a quick
... crickets :-(
Unfortunately I did not see the notice about not running
In hindsight this is obvious, because the 13.2 boot loader doesn't know about the new upgraded zfs features.
I managed to repair the boot partition enough to boot again, mostly by following the tips such as:
My problem:
zpool is no longer able to import my data pool.
This may be unrelated to booting 13.2 in a 14.0 environment but my data pool is the pool I use daily and it was working just fine until I unwittingly rolled back to 13.2.
My machine is an old Intel NUC, with a 60GB internal SSD (zroot) and 2 external USB 3.0 SSD's for data (zreserve).
The data pool (zreserve) is striped over the 2 external SSD's - completely separate to the internal boot SSD, but they were still attached while I went through the steps above. (yes, striping is fragile, and yes, I have backups)
The symptoms:
zpool doesn't see a problem, and says that I can import my pool:
... and then fails when I actually try to:
gpart shows the internal and external disks, all OK so far:
The "CORRUPT" status has been following me for years. It always happens on this machine, but
a) zfs always worked fine, even if gpart said the partition tables were incomplete and
b)
oops :-(
even so,
and glabel also looks normal (it always showed N/A for the status)
but
and yet, it quite happily seems to be able access the whole pool:
but these messages are new:
So, does this mean my SSD just died?
Are there any other things I could try to recover the pool?
If you're still reading... thanks, I'm impressed and grateful for any tips!
?
If you have
- upgraded to 14.0
- and you boot from ZFS
- and you ran
zpool upgrade
on your root ZFS pool - and you
beadm activate
a pre-14.0 snapshot
How did I get here:
My FreeBSD system uses UEFI, I boot from ZFS and I use
beadm
, just in case I have to roll back when things go wrong.Last week I updated to FreeBSD 14.0-RELEASE via
freebsd-update
I
apt upgrade
'ed everything, performed several reboots and everything seemed to be wonderful.... until today when I discovered that my teamspeak server was quietly crashing, without a core dump or error message.
I have been using beadm for years, so the solution was obvious, roll back, then look for answers.
So, a quick
beadm activate 13.2-RELEASE-p9_xxx
followed by a reboot and then...... crickets :-(
Unfortunately I did not see the notice about not running
zpool upgrade
on your boot pool, so I rammed straight into the "ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2" problem which has been reported here so often.In hindsight this is obvious, because the 13.2 boot loader doesn't know about the new upgraded zfs features.
I managed to repair the boot partition enough to boot again, mostly by following the tips such as:
- Answer: update boot codes before rebooting
- Upgrading from Previous Releases of FreeBSD
- Boot Loader Changes
- and finally calling
beadm activate
again, so that it boots into the live, 14.0 environment.
My problem:
zpool is no longer able to import my data pool.
This may be unrelated to booting 13.2 in a 14.0 environment but my data pool is the pool I use daily and it was working just fine until I unwittingly rolled back to 13.2.
My machine is an old Intel NUC, with a 60GB internal SSD (zroot) and 2 external USB 3.0 SSD's for data (zreserve).
The data pool (zreserve) is striped over the 2 external SSD's - completely separate to the internal boot SSD, but they were still attached while I went through the steps above. (yes, striping is fragile, and yes, I have backups)
The symptoms:
zpool doesn't see a problem, and says that I can import my pool:
Code:
# zpool import
pool: zreserve
id: 15131527649563311145
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
zreserve ONLINE
gpt/zdomus-4tb-13t ONLINE
gpt/zdomus-4tb-19v ONLINE
... and then fails when I actually try to:
Code:
# zpool import zreserve
cannot import 'zreserve': one or more devices is currently unavailable
gpart shows the internal and external disks, all OK so far:
Code:
# gpart show -p
=> 40 117231328 ada0 GPT (56G)
40 532480 ada0p1 efi (260M)
532520 1024 ada0p2 freebsd-boot (512K)
533544 984 - free - (492K)
534528 4194304 ada0p3 freebsd-swap (2.0G)
4728832 112500736 ada0p4 freebsd-zfs (54G)
117229568 1800 - free - (900K)
=> 40 7814037088 da0 GPT (3.6T) [CORRUPT]
40 4056 - free - (2.0M)
4096 7759462400 da0p1 freebsd-zfs (3.6T)
7759466496 54570632 - free - (26G)
=> 40 7814037088 da1 GPT (3.6T) [CORRUPT]
40 7814037088 da1p1 freebsd-zfs (3.6T)
The "CORRUPT" status has been following me for years. It always happens on this machine, but
a) zfs always worked fine, even if gpart said the partition tables were incomplete and
b)
gpart recover <device>
always fixed it.
Code:
# gpart recover da0
gpart: Input/output error
# gpart recover da1
da1 recovered
even so,
gpart status
doesn't see any problems any more:
Code:
# gpart status
Name Status Components
ada0p1 OK ada0
ada0p2 OK ada0
ada0p3 OK ada0
ada0p4 OK ada0
da0p1 OK da0
da1p1 OK da1
and glabel also looks normal (it always showed N/A for the status)
Code:
# glabel status
Name Status Components
gpt/efiboot0 N/A ada0p1
gpt/gptboot0 N/A ada0p2
gpt/swap0 N/A ada0p3
gpt/zdomus-4tb-13t N/A da0p1
gpt/zdomus-4tb-19v N/A da1p1
but
zdb
has got me worried:
Code:
# zdb -l da0
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2 <<---- shouldn't there be a ZFS label here?
failed to unpack label 3
# zdb -l da1
failed to unpack label 0
failed to unpack label 1
------------------------------------
LABEL 2 (Bad label cksum)
------------------------------------
version: 5000
state: 3
guid: 4620694731288109924
labels = 2
failed to unpack label 3
and yet, it quite happily seems to be able access the whole pool:
Code:
zdb -d zreserve | grep -v 'objects$' # grep out the full list of datasets - they all look OK to me :-)
Verified large_blocks feature refcount of 0 is correct
Verified large_dnode feature refcount of 0 is correct
Verified sha512 feature refcount of 0 is correct
Verified skein feature refcount of 0 is correct
Verified edonr feature refcount of 0 is correct
Verified userobj_accounting feature refcount of 649 is correct
Verified encryption feature refcount of 0 is correct
Verified project_quota feature refcount of 649 is correct
Verified redaction_bookmarks feature refcount of 0 is correct
Verified redacted_datasets feature refcount of 0 is correct
Verified bookmark_written feature refcount of 0 is correct
Verified livelist feature refcount of 0 is correct
Verified zstd_compress feature refcount of 0 is correct
Verified zilsaxattr feature refcount of 6 is correct
Verified blake3 feature refcount of 0 is correct
Verified device_removal feature refcount of 0 is correct
Verified indirect_refcount feature refcount of 0 is correct
but these messages are new:
Code:
# grep -w da0 /var/log/messages
Jan 25 23:50:24 domus kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 00 00 12 10 00 00 10 00
Jan 25 23:50:24 domus kernel: (da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
Jan 25 23:50:24 domus kernel: (da0:umass-sim0:0:0:0): SCSI status: Check Condition
Jan 25 23:50:24 domus kernel: (da0:umass-sim0:0:0:0): SCSI sense: ABORTED COMMAND asc:0,0 (No additional sense information)
Jan 25 23:50:24 domus kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
So, does this mean my SSD just died?
Are there any other things I could try to recover the pool?
zdb -c
says it'll take 10..20 hours to complete... would it be worth doing anyway and maybe even zdb -b
to take one last snapshot before it dies completely?If you're still reading... thanks, I'm impressed and grateful for any tips!
?
