ZFS DEGRADED zpool mirror... how to fix this?

Hello everyone,

I hope I am not cooked, but here's my problem. Hopefully the ZFS champs will be able to help me...

I had a small ZFS mirrored setup running FreeBSD 14.x with two SSDs of 240GB each. It so happened that one fine day the machine suddenly hanged... so at that time, when I checked one of the SSDs was having some issue which I couldn't investigate fully as I am struggling to find time these days due to a family health emergency that takes up most of my time. So a quick fix I did was I just removed the SSD that was giving trouble and everything was working fine... until I made a stupid mistake this week!

I upgraded the existing SSD to the latest FreeBSD 14.3 p6 version and thought maybe I will just get the old SSD (mirror) back in and see if I can resolve that issue by resilvering (which obviously was running a lower version of FreeBSD 14.x). I also started scrubbing soon after and suddenly the machine hung. I don't know exactly what went wrong and I did a hard reboot after removing the older SSD.

Now my machine fails to boot up with the error below (even if I try booting using the Live USB and try to mount manually):

1000093368.jpg


When I try to import the zpool via Live USB, I get the following:

1000093367.png


My machine unfortunately is not allowing me to add another SSD despite having 4 SATA connections so that I can do a zpool replace. Now if I connect the old SSD, the machine just hangs - I can't even get into the BIOS setup. My only hope is that ada0p3 is showing ONLINE and zpool status is still DEGRADED and not FAULTED so perhaps I can still wriggle out safely with my data intact.

Ok so if there is anything I can do to get the system back working or atleast get my data out from ada0p3, please do let me know. I have a working Live USB of FreeBSD 14.3 p6 with me.

Looking forward to hearing from you folks.

Best regards,

Nitin
 
Assunming that disk is dead (or dead-ish, but SATA consumer drives won't just shut up and die, but block the system in various ways as you discovered...), just disconnect and forget about it.
You still have one healthy copy of your mirror and the pool is working (see the "action" comment), so just attach a new disk, format it accordingly, e.g. by just gpart backup | gpart restore from the healthy disc and if applicable change the gpart labels, and run a zpool-replace(8).
You could also first zpool-detach(8) the faulted (and no longer present) device from the mirror, then zpool-attach(8) the new device.[/man]
 
Yes, you can (should) do all of that from some other bootable media or on another system and make sure to not mount any datasets (or readonly) until the pool is back in a healthy state.

Any chance that second disk is also failing? Or does that pool have checkpoints which may have blocked a successful remove/detach of the dead disk?

If the pool fails to import even with that dying disk disconnected, there may be some corrupted metadata. Try importing the pool without mounting anything ( zpool import -FN - if that succeeds, you can try to zfs send datasets holding important data to another pool. You can usually send off all datasets except the ones that are associated with the corrupted metadata. If there is any "pool-essential" metadata corrupted, prepare to restore from backups.

I've only ever encountered metadata corruption thanks to a failing HBA. A failing (sata) disk occasionally dragged down a system to an unusable state, but zfs always survived unharmed.
 
Back
Top