Cloning a failing ZFS drive

Hi,

I've had a bit of a disaster with my ZFS pool. 3 drive pool, one of the drives failed, and was removed and sent for RA. A few days later a 2nd drive in the pool failed.

I'm now trying all my best data-recovery tricks (the drive has issues powering up, so going to try freezing it) and I'm hoping to clone it to another drive using something like ghost in the hope that it will run long enough to do this, and that ZFS will accept the cloned drive.

Has anyone done this before? Will I run into problems getting ZFS to accept the cloned drive as the original drive?

If not, is there anything else I should be trying?
 
I'm assuming by 3 drive pool, you mean a RAIDZ? When one of the drives fails you just don't remove the drive, you replace it.

Since there's now a second broken disk, it's time to check if your backup is working.
 
You will be better to clone the failing drive with dd, like

# dd if=/dev/old_drive of=/dev/new_drive bs=1m

You may wish to consult the dd page for options how to skip unreadable blocks etc. The new drive needs to have at least the same number of sectors as the old drive.
 
danbi said:
You will be better to clone the failing drive with dd, like

# dd if=/dev/old_drive of=/dev/new_drive bs=1m

You may wish to consult the dd page for options how to skip unreadable blocks etc. The new drive needs to have at least the same number of sectors as the old drive.

Thanks, I think I will end up going this way, as Ghost/Acronis True Image were not able to see the drive properly.

Ultimately the drive runs fine for about 30 minutes, so I think using ddrescue, and having it resume from a certain sector may be the only way.
 
Hadn't seen recoverdisk(1) before. Not sure it'd be a good tool in some cases; when the rust is flaking off of a platter, rereading the bad spots multiple times could make it worse, or crash the heads.

The idea of dropping bad sectors from one drive of an array kind of makes me queasy. You'd hope the RAID would rebuild that data, but it would have changed when that drive wasn't in the array: bad drive taken out of array, dd-ed to good drive, good drive replaced in array. Will the RAID controller figure things out correctly and do the right thing? Maybe...
 
wblock said:
Hadn't seen recoverdisk(1) before. Not sure it'd be a good tool in some cases; when the rust is flaking off of a platter, rereading the bad spots multiple times could make it worse, or crash the heads.

The idea of dropping bad sectors from one drive of an array kind of makes me queasy. You'd hope the RAID would rebuild that data, but it would have changed when that drive wasn't in the array: bad drive taken out of array, dd-ed to good drive, good drive replaced in array. Will the RAID controller figure things out correctly and do the right thing? Maybe...

Yes further damaging the drive is a big concern here, but in this case I'm not sure there is much choice. And it isn't really any worse than dd() except that it will go back and continue to attempt to read the bad blocks which is sometimes successful. You need to migrate the data somehow, just as well try something that will try harder at getting a pristine image. The op didn't identify the raid type so it's hard to say exactly what is needed. If they are running a 3-way mirror, it's no problem. If it's RAIDZ, unless you get a perfect copy there will be data loss, maybe pool loss depending on what data is bad. Given the OP's level of panic, I would assume it's RAIDZ so if it was my setup, I'd run recoverdisk(1).

I did find a success story here:

http://robinbowes.com/article.php/20090420153906928
 
In principle, ZFS should be able to recover from partially bad disk. It should be able to reconstruct data from two or more partially damaged disks, as long as the damage does not result in loss of redundancy.

However, if the disk tends to go away after a while, you need some procedure to copy as much as you can, before the disk wedges again. dd will try more than one time (not dd itself, the lower layers) to read sectors anyway. Also, with dd you may copy as much data you can before you lose the source disk, then continue from about where it stopped.
 
Back
Top