ZFS resilience to temporary hardware failure

pillai_hfx · May 30, 2012

Hi,

I moved four of the Dell MD1000's I have from linux and LSI hardware RAID to ~~freebsd~~ FreeBSD 8.3 and software RAID using ZFS to take advantage of compression. This storage is used for backup purposes. It has been very stable so far. I have a few questions on potential issues if I start using ZFS on a larger scale in future. Hopefully somebody in the forum might be able to help me with these questions.

Right now some of the raidz2 sets cross enclosures. (half on one, half on the next JBOD). What happens if one enclosure fails due to a power supply issue or SAS expander failure? The raidz2 sets should be marked as faulty I assume. But when the JBOD comes back after solving the hardware issues, does ZFS detect that the offline drives have come back or does a manual device scan with zpool import fix the issue? The same issues could happen when thinking of a larger scale like having a few SAS switches and JBODs hooked up to the switches and the switches or JBODs fail due to hardware problems or a simple thing like PDU failures. How resilient is ZFS in FreeBSD to these kinds of temporary hardware failures?

Thanks

phoenix · May 30, 2012

If you have enough enclosures, the "best practise" is to use 1 disk from each enclosure to build the raidz vdev. That way, if an entire enclosure loses power or has a catastrophic hardware failure, you only lose 1 disk per vdev, and the pool continues one without any issues. And, when the hardware is restored, the pool will happily resilver each disk in each vdev without issues.

It's the same "best practise" when creating vdevs across controllers. If possible, only use 1 disk per controller to create the vdev, for the same reason that losing an entire controller (which should be fairly rare) only results in the loss of 1 disk per vdev. Thus, the pool carries on.

If you don't have enough controllers/enclosures to do that, then things get a bit tougher, but aren't impossible. If you lose an entire controller/enclosure, meaning you lose multiple disks per vdev, to the point where it would become FAULTED instead of DEGRADED, you have to immediately halt the system, and stop all writes to the pool. Don't do anything, until the controller/enclosure is replaced and all the disks are available again. Then you can boot/import the pool, and things should carry on without issues.

If you don't stop writes to the pool, and the disks in the pool get too out-of-sync with the disks out of the pool, you risk losing the entire pool.

You may be able to import the pool using the -F (rollback) option, which should be able to "go back in time" to find a transaction group where all the disks are in sync. You may lose a bit of data, but the pool should import, and you can carry on.

If things disappear/re-appear within a few seconds, things should just carry on without issues.

ZFS resilience to temporary hardware failure

pillai_hfx

phoenix