ZFS failed drive recovery

budric · Dec 29, 2008

Hi,
if I configured a pool of drives in ZFS without redundancy (like JBOD) and one of the drives failed, would I still be able to somehow repair the file system and get back the files that weren't stored on the failed drive?

For convenience I'd like to merge different sized disks together into one big volume. I didn't want redundancy because this volume is used as backup/not important data and I don't want to pay with the extra space. But, I still don't want to lose everything just because one drive failed. I've searched google but haven't found anyone talking about this situation - only failure when you're running mirror or raid-z. Is this possible at all with any free/open source file system out there?

Thanks!

hedwards · Dec 30, 2008

There is a very big difference between z-mirror, mirror and JBOD. The difference being in the former two there's redundancy and multiple copies, with the later there's just one copy of the data. Furthermore the filesystem is spread across numerous disks. Meaning that you can lose data but you're not necessarily going to know which data ahead of time.

As for how to recover the files, I really don't know. But it's worth explaining why this is so much harder to deal with.

Edit: If you've got an extra disc of the appropriate size, you might consider using dd to try and recover as much of the system as possible and then try and replace the original disc in the array with the new one.

budric · Dec 30, 2008

Hi hedwards,
I was considering a "basic pool" without mirroring (http://dlc.sun.com/osol/docs/content/ZFSADMIN/gaypw.html). All of the troubleshooting in the administration guide discusses how to deal with a degraded pool. I want to know what can be done with a faulted pool. If all data would be lost. Since I don't actually have this setup, I'm researching this for a file server with FreeBSD, I was hoping someone with experience could help out.

I'm not an expert on file systems, but reading a description of ext2 on wikipedia, it says that the file system is organized as block groups, each group has superblock copy, inodes bitmap etc. If ext2 could span multiple drives it seems possible to recover the file system if one of the drives failed. You seem to have most of the information available, the superblock copies, iterate through the inodes and see which data blocks are accessible. In practice I don't know how (if at all) this would be done. So I'm asking if maybe ZFS can do something like that and recover some of the file system after failure.

Djn · Dec 31, 2008

The easiest way to find out would probably be to test it - set up ZFS on a few temporary devices (ramdisks should be ideal), then yank one of them out from underneath it and see what it does.

Actually, let me test that ...

Update:
Testing this on a 32-bit, 1GB ram x86 box not tuned for ZFS might not have been a good idea (for "kernel panic"-values of "not good").
Trying again with a smaller filesystem.

Djn · Dec 31, 2008

Right, this is what I have so far.

Parts of this is from memory, but it should be correct:

Code:

# dd if=/dev/zero of=disk1 bs=1M count=100
# dd if=/dev/zero of=disk2 bs=1M count=100
# mdconfig -a -t vnode -f disk1
md0
# mdconfig -a -t vnode -f disk2
md1
# zpool create testpool md0 md1
# cp -R /usr/src/sys /testpool/
# zpool export testpool
# cp disk2 disk2.copy
# dd if=/dev/zero of=disk2 bs=1M count=100
# zpool import testpool
cannot import "testpool": one or more devices is currently unavailable
# dd if=disk2.copy of=disk2 bs=1M count=1
# zpool import testpool
# zpool scrub testpool
(slightly later)
# zpool status -v testpool
  pool: testpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 0h0m with 3598 errors on Wed Dec 31 03:31:01 2008
config:

        NAME        STATE     READ WRITE CKSUM
        testpool    ONLINE       0     0 3.51K
          md0       ONLINE       0     0     0
          md1       ONLINE       0     0 9.03K  1.46M repaired

errors: Permanent errors have been detected in the following files:

        /testpool/sys/boot/pc98/boot2/dinode.h
        /testpool/sys/compat/svr4/svr4_acl.h
        /testpool/sys/dev/acpica/acpi_pcib_pci.c
        /testpool/sys/dev/ce/if_ce.c
        /testpool/sys/dev/pccard/pccarddevs
        /testpool/sys/dev/usb/if_axe.c
        /testpool/sys/fs/procfs/procfs_note.c
        /testpool/sys/amd64/include/ptrace.h
        /testpool/sys/boot/pc98/boot2/disk.c
        /testpool/sys/compat/svr4/svr4_dirent.h
        /testpool/sys/contrib/altq/altq/altq_priq.c
        /testpool/sys/dev/acpica/acpi_pcibvar.h
        /testpool/sys/dev/ce/ng_ce.h
        /testpool/sys/dev/mem/memdev.c
        /testpool/sys/dev/pccard/pccardreg.h
        /testpool/sys/dev/usb/if_axereg.h
        /testpool/sys/fs/procfs/procfs_regs.c
(etc)

Basically, it won't even try to import the zpool if it can't find enough devices to cover the dataset. With disk2 completely zeroed, it wasn't recognized, so zpool import refused to acknowledge it. It might be possible to tell it to replace the missing device with this seemingly new, blank one; I didn't try.

With the first MB of disk2 it was recognized, so I guess there's a header at the start that's important. It correctly notices that it's been seriously manhandled, and lists off which files this affects.

All in all, it found 3598 errors after I nuked the last 99 of 100 MB of the second disk.
Copying out the file tree gives me the correct 7336 files, but diff finds 3527 files different from the originals. As far as I can tell, all those files give IO errors when trying to read them on ZFS, so the copies are 0 byte - this matches what Sun says will happen in this case.

edogawaconan · Jan 1, 2009

Djn said:
With the first MB of disk2 it was recognized, so I guess there's a header at the start that's important. It correctly notices that it's been seriously manhandled, and lists off which files this affects.

It ever happens to me, and by deleting error'd files I get working pool again (sans deleted files)

budric · Jan 4, 2009

Thanks, Djn for the test.

ZFS failed drive recovery

budric

hedwards

budric

Djn

Djn

edogawaconan

budric