Clone failed drive

I have two 6 disk RAIDZ2 vdevs in a pool and lost 4 drives from one vdev to water damage so the pool is no longer functioning. I am fairly sure just the PCB was fried and the platters are perfectly fine so I was wondering if it's possible to have a recovery service clone the failed drives onto a new drive to get the pool working. The problem is I know ZFS stores extra info about the specific drives themselves and I'm not sure if it would work even if the drives are perfect clones. Anybody familiar with this kind of scenario? I have looked around and found a similar post viewtopic.php?t=45057 but it doesn't seem like it quite worked.
 
Not to rub salt in your wounds but this is a perfect example why RAID isn't a substitute for good backups.

I don't think ZFS is bound to a specific drive. If it's actually a perfect clone of the data I would expect it to work. But I'm definitely not a ZFS expert so I could be wrong.
 
Ha yeah I know it's not a backup which is why important files are backed up online. I am in the process of downloading those, but I would like to recover all the data if it is possible and not too expensive. If not then oh well I can live with just the important files.

Thanks for the input, that is what I would normally think too, but the recovery service will most likely be 600-800$ and I want to make sure before I proceed.
 
Baraoic said:
Thanks for the input, that is what I would normally think too, but the recovery service will most likely be 600-800$ and I want to make sure before I proceed.
Yeah, those services cost an arm and a leg.

I'm not sure if it would be representable but you could try to set up a basic RAIDZ2 setup using a couple of mdconfig(8) vnode files. Then dd(1) one or two to a different file and see if you can replace the originals with it.
 
SirDice said:
Baraoic said:
Thanks for the input, that is what I would normally think too, but the recovery service will most likely be 600-800$ and I want to make sure before I proceed.
Yeah, those services cost an arm and a leg.

I'm not sure if it would be representable but you could try to set up a basic RAIDZ2 setup using a couple of mdconfig(8) vnode files. Then dd(1) one or two to a different file and see if you can replace the originals with it.

I thought about trying it with a virtual system, but I wanted to see if anybody had real experience first. Rather have it tried with physical drives as who knows if there is something quirky about trying this with physical disks as opposed to virtual ones. I just don't have the spare disks to test it myself.
 
Well I tried something similar on a virtual tonight and ran into problems exactly where I figured I would. I can't figure out how to online the pool after attaching the cloned drive.

I created a RAIDZ with 3 disks (da1-3) and copied a file to the pool. Then rebooted to ensure everything was synced and added a drive. Then used dd to clone da3 to the new drive. Then I shut down and removed da2-3 and when I booted back up is says the pool is UNAVAIL
Code:
There are insufficient replicas for the pool to continue functioning.
Though when I do a zpool status it shows
Code:
  pool: pool
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                    STATE     READ WRITE CKSUM
        pool                    UNAVAIL      0     0     0
          da1                   ONLINE       0     0     0
          14431679127374845991  UNAVAIL      0     0     0  was /dev/da2
          da2                   ONLINE       0     0     0
So it does see the cloned drive and knows it should be da3. When I do a zdb -l /dev/da2 it shows
Code:
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 5000
    name: 'pool'
    state: 0
    txg: 4
    pool_guid: 10065632748344913159
    hostid: 2280388727
    hostname: 'nas4free.local'
    top_guid: 7829240125557966298
    guid: 7829240125557966298
    vdev_children: 3
    vdev_tree:
        type: 'disk'
        id: 2
        guid: 7829240125557966298
        path: '/dev/da3'
        phys_path: '/dev/da3'
        whole_disk: 1
        metaslab_array: 34
        metaslab_shift: 23
        ashift: 9
        asize: 1069023232
        is_log: 0
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
Which looks right except for both paths so I added a disk to force it to be da3, but I still can't online the pool. Any zpool command I try just gives me
Code:
cannot open 'pool': pool is unavailable
I even tried exporting and force a reimport, but that doesn't work either. So I'm out of ideas for now, If anybody has any ideas then please let me know.
 
I created a RAIDZ with 3 disks (da1-3) and copied a file too the pool. Then rebooted to ensure everything was synced and added a drive. Then used dd to clone da3 to the new drive. Then I shutdown and removed da2-3 and when I boot back up is says the pool is UNAVAIL "There are insufficient replicas for the pool to continue functioning."
Though when I do a zpool status it shows

Maybe you haven't described exactly what you did but here's my interpretation of the above:

  1. You created a ZFS pool out of three disks.
    Now, according to your status output, this is *NOT* a RAIDZ. It is a simple striped pool with no redundancy.
  2. You then cloned da3 to a new disk
  3. Rebooted the system and removed both da2 and da3
  4. On startup, ZFS has found your cloned disk fine, now called da2 by the OS.
  5. The disk that was originally called da2 has also been removed so is not available. The non-redundant pool is now unavailable.

ZFS should have no problems with cloned disks. Everything it needs to identify a disk is in that label. This also includes the disk GUID, which ZFS uses to identify its disks, completely independently of their OS device name. It wouldn't be possible to move ZFS pools between Linux/FreeBSD and OpenSolaris based systems otherwise.
 
usdmatt said:
I created a RAIDZ with 3 disks (da1-3) and copied a file too the pool. Then rebooted to ensure everything was synced and added a drive. Then used dd to clone da3 to the new drive. Then I shutdown and removed da2-3 and when I boot back up is says the pool is UNAVAIL "There are insufficient replicas for the pool to continue functioning."
Though when I do a zpool status it shows

Maybe you haven't described exactly what you did but here's my interpretation of the above:

  1. You created a ZFS pool out of three disks.
    Now, according to your status output, this is *NOT* a RAIDZ. It is a simple striped pool with no redundancy.
  2. You then cloned da3 to a new disk
  3. Rebooted the system and removed both da2 and da3
  4. On startup, ZFS has found your cloned disk fine, now called da2 by the OS.
  5. The disk that was originally called da2 has also been removed so is not available. The non-redundant pool is now unavailable.

ZFS should have no problems with cloned disks. Everything it needs to identify a disk is in that label. This also includes the disk GUID, which ZFS uses to identify its disks, completely independently of their OS device name. It wouldn't be possible to move ZFS pools between Linux/FreeBSD and OpenSolaris based systems otherwise.

I thought I did a RAIDZ, but you know it was late and maybe I didn't. I will redo it tonight and update the post, thank you.
 
I did mess up the first test. I did the same test and it worked. I did have to first export then reimport the pool to get it working again, but after that the data showed up just fine. I am going to proceed with imaging of the dead hard drives using a recovery service and hope they work.
 
I just wanted to update and say I got three of the drives cloned from the data recovery service and the pool is 100% recovered. Didn't have to do anything special either, I just plugged back in the three drives and it recognized them. Then I plugged in a new blank one for the fourth missing drive and did a zpool replace to replace the missing drive with the new one and it resilvered it, which included the verification of the cloned drives. Not reported errors after it was done and I have spot checked some files, which are all good.
 
Cool. Glad you managed to get the pool back and the cost of the cloning didn't end up being a waste. Given all the posts on here from people who've had problems getting ZFS to rebuild a failed disk, it's also good to hear from someone who's had to go as far as getting disks sent away and rebuilt, and still had the pool come back up 100% again.
 
Given all the posts on here from people who've had problems getting ZFS to rebuild a failed disk, it's also good to hear from someone who's had to go as far as getting disks sent away and rebuilt, and still had the pool come back up 100% again.

Huh? I've rebuilt my RAIDz multiple times already, because of failing disks. I have never noticed any problems. It's simple and fast to do.

Maybe the people for whom it works just don't talk about it, because it's what ZFS is supposed to do and no great deal.
 
Back
Top