RAIDZ Unavailable after Replacing One Drive

wavesound · Feb 28, 2011

Everyone,

I think I've stumbled into what appears to be a Catch-22 with ZFS on FreeBSD.

Testing out ZFS with an SiI3124 and eight drives across two SiI3726 Port Multipliers. So far performance is good with some tweaks using the siis driver (siis_load="YES" in /boot/loader.conf)

However, we're testing some disaster recovery scenarios here before deployment and appear to have an issue where one drive is replaced the array is made permanently unavailable. The theory is that, we should be able to replace a drive, re-silver and be back in business...

However, once we reboot the system with the new drive, the zpool is stuck in an UNAVAIL state. This confuses me on two counts:

1. A RAIDZ should be able to operate without one member and be in a DEGRADED state. So, why are we in an UNVAIL state?
2. Why can't we get out of this UNAVAIL state?

Steps:

1. Shutdown FreeBSD 8.2-RELEASE (hot-swap does not work well on these port multipliers).
2. Remove physical drive and replace with new identical drive.
3. Boot system with FreeBSD 8.2-RELEASE.
4. All drives are discovered and available in /dev (ada0-ada7)

Code:

monster# zpool status
  pool: zp0
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
	replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-3C
 scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	fvzp0       UNAVAIL      0     0     0  insufficient replicas
	  ada0      ONLINE       0     0     0
	  ada1      ONLINE       0     0     0
	  ada2      ONLINE       0     0     0
	  ada3      ONLINE       0     0     0
	  ada4      ONLINE       0     0     0
	  ada5      ONLINE       0     0     0
	  ada6      ONLINE       0     0     0
	  ada7      UNAVAIL      0     0     0  cannot open

Code:

monster# zpool online zp0 ada7
cannot open 'zp0': pool is unavailable

Code:

monster# zpool detach zp0 ada7
cannot open 'zp0': pool is unavailable

Code:

monster# zpool replace zp0 ada7 ada7
cannot open 'zp0': pool is unavailable

Code:

monster# zpool offline zp0 ada7
cannot open 'zp0': pool is unavailable

The new ada7 is functioning fine, and can even be exported on its own using istgt. However, since the zpool is in an unavailable state, we can't do *anything* to zpool zp0. The only way we've found to recover is to return the original drive to the slot occupied by ada7 and reboot.

However, if a drive dies in production and won't come up, we need to be able to replace the drive and resilver...

Any help and thoughts are appreciated!

Galactic_Dominator · Feb 28, 2011

Your pool is configured as bunch of single drives, not RAIDZ.

RAIDZ Unavailable after Replacing One Drive

wavesound

Galactic_Dominator