Everyone,
I think I've stumbled into what appears to be a Catch-22 with ZFS on FreeBSD.
Testing out ZFS with an SiI3124 and eight drives across two SiI3726 Port Multipliers. So far performance is good with some tweaks using the siis driver (siis_load="YES" in /boot/loader.conf)
However, we're testing some disaster recovery scenarios here before deployment and appear to have an issue where one drive is replaced the array is made permanently unavailable. The theory is that, we should be able to replace a drive, re-silver and be back in business...
However, once we reboot the system with the new drive, the zpool is stuck in an UNAVAIL state. This confuses me on two counts:
1. A RAIDZ should be able to operate without one member and be in a DEGRADED state. So, why are we in an UNVAIL state?
2. Why can't we get out of this UNAVAIL state?
Steps:
1. Shutdown FreeBSD 8.2-RELEASE (hot-swap does not work well on these port multipliers).
2. Remove physical drive and replace with new identical drive.
3. Boot system with FreeBSD 8.2-RELEASE.
4. All drives are discovered and available in /dev (ada0-ada7)
The new ada7 is functioning fine, and can even be exported on its own using istgt. However, since the zpool is in an unavailable state, we can't do *anything* to zpool zp0. The only way we've found to recover is to return the original drive to the slot occupied by ada7 and reboot.
However, if a drive dies in production and won't come up, we need to be able to replace the drive and resilver...
Any help and thoughts are appreciated!
I think I've stumbled into what appears to be a Catch-22 with ZFS on FreeBSD.
Testing out ZFS with an SiI3124 and eight drives across two SiI3726 Port Multipliers. So far performance is good with some tweaks using the siis driver (siis_load="YES" in /boot/loader.conf)
However, we're testing some disaster recovery scenarios here before deployment and appear to have an issue where one drive is replaced the array is made permanently unavailable. The theory is that, we should be able to replace a drive, re-silver and be back in business...
However, once we reboot the system with the new drive, the zpool is stuck in an UNAVAIL state. This confuses me on two counts:
1. A RAIDZ should be able to operate without one member and be in a DEGRADED state. So, why are we in an UNVAIL state?
2. Why can't we get out of this UNAVAIL state?
Steps:
1. Shutdown FreeBSD 8.2-RELEASE (hot-swap does not work well on these port multipliers).
2. Remove physical drive and replace with new identical drive.
3. Boot system with FreeBSD 8.2-RELEASE.
4. All drives are discovered and available in /dev (ada0-ada7)
Code:
monster# zpool status
pool: zp0
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-3C
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
fvzp0 UNAVAIL 0 0 0 insufficient replicas
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
ada4 ONLINE 0 0 0
ada5 ONLINE 0 0 0
ada6 ONLINE 0 0 0
ada7 UNAVAIL 0 0 0 cannot open
Code:
monster# zpool online zp0 ada7
cannot open 'zp0': pool is unavailable
Code:
monster# zpool detach zp0 ada7
cannot open 'zp0': pool is unavailable
Code:
monster# zpool replace zp0 ada7 ada7
cannot open 'zp0': pool is unavailable
Code:
monster# zpool offline zp0 ada7
cannot open 'zp0': pool is unavailable
The new ada7 is functioning fine, and can even be exported on its own using istgt. However, since the zpool is in an unavailable state, we can't do *anything* to zpool zp0. The only way we've found to recover is to return the original drive to the slot occupied by ada7 and reboot.
However, if a drive dies in production and won't come up, we need to be able to replace the drive and resilver...
Any help and thoughts are appreciated!