ZFS ZFS Array Drive Melted

In looking over my Nas4Free system this morning, I noticed that one of the disks in my 6TB ZFS array was marked as "removed" even though it was plugged in and appeared to be functioning fine. I've had some power glitches around here in the past few weeks and I thought maybe that had thrown my drive into an error state so I figured I'd see if it came back on a reboot.

When I rebooted my root my root filesystem (Nas4Free is running from a USB key) would not mount...

I was pretty confused so the first thing I did was unplug all drives except for the USB key and try again. It still failed and at that point I figured the install on the USB key had somehow become corrupted (which proved true, after copying the img file back to the key it boots fine)

However, when I re-plugged the drives in and booted up the computer, one of the SATA power cables immediately started spewing smoke and melted itself... and the connector on the hard drive, before I was able to pull the plug.
icon_e_sad.gif


No clue what happened there, and although I suspect the disk itself is fine, the connector on the drive is completely ruined.

In any case, I pulled that drive, re-imaged the USB key, booted up the system and at least my initial assumption was correct, the "missing" disk was found at startup so the only disk absent from the system was the one that almost caught fire. So... I took a look at the ZFS pool.... which looks like this:
Code:
zfsdata FAULTED corrupted data
raidz1-0 FAULTED corrupted data
12637692214261834096 FAULTED corrupted data
11306278499670812609 FAULTED corrupted data
9359699380247702151 FAULTED corrupted data
4520295435616108019 FAULTED corrupted data
3374229570764583106 FAULTED corrupted data


I suspect that because the array already thought it was missing a disk, that losing the 2nd disk to the faulty power connection constitutes a 2nd drive failure and I'm completely out of luck?

I do have a pretty recent backup but obviously I'd like to salvage the array if I can.
1) there's a bit of data on this array since the last backup that I'd be sad to lose.
2) I have over 4TB of data on this thing and just copying that much data from a backup to a new array takes a ton of time
 
With RAID-Z one disk can fail and the RAID set would still be available. However, if a second disk dies the whole set is lost.
 
Of course... but 4 of my 5 disks are still healthy.

What I'm unsure of is that one of those disks was previously showing up as "removed" in zpool status, even though it was still plugged in and healthy. I tried to reboot to clear that up and *that* is when I lost a disk for real. I suspect that the fact that the array "thought" there was a bad disk makes the "real disk failure result in really losing my data
 
Yes, that's probably what happened. The loss of the first disk was never fully recovered (resilvered) before the second disk failed.
 
Since the "removed" disk was out of the array while the array was still live, it is essentially an out of date member and therefore should not be used in the live array. Even if you replaced the melted drive any data that was touched in any way while that other drive was "removed" would be unrebuildable when it rebuilt the new drive. Unfortunately, this is what backups are made for.

I would be very wary of plugging the scorched drive back into a machine I care about. If you have a power supply laying around that you can live without you can test it with that, but there isn't much about a SATA power cable that will cause what you describe. It's possible something in the drive shorted the cable and would do the same to the next one. I'm surprised it didn't pop your power supply.
 
Last edited by a moderator:
Back
Top