ZFS root RAIDZ1 drive failing, set offline, no longer boots

I've got a FreeBSD 10 server with 4 disks with the above configuration. Disk ada1 is failing so I picked up a new disk, set ada1 to offline, shut down the server and installed a new disk in its place. When I tried to boot I received ZFS I/O error all block copies unavailable messages. I booted the FreeBSD 10 install disk and attempted to import the volume yet it complained that a device was missing (it indeed was replaced by the new drive). I shut the system back down, put the old failing drive back in and received the same errors upon trying to boot. I booted the installer, started a shell and successfully imported the zpool and I'm in the process of recovering config files and other data. That said, I'm not sure how to replace the failed disk or even correct this boot problem. I followed the ZFS guide on http://freebsd.org which stated to set the disk to offline, swap with new drive and then use zfs replace to get things back moving which obviously hasn't worked. Any ideas or a clear guide on how to recover from a failing disk on ZFS RAIDZ1 on root install?
 
Re: ZFS root RAIDZ1 drive failing, set offline, no longer bo

So I spent some time trying to recreate this problem without success. I can only assume that there was some severe corruption going on or another disk is failing however I only saw ada1 mentioned in the kernel messages. What I did precisely was:

Code:
# zpool offline zroot {guid}
# halt

I then replaced physical device and attempted to power system back up where I was presented with the ZFS i/o error messages. I successfully recovered the data via the LiveCD so at this point I'm going to write the OS off and start over with new disks. What I would like to know is where things went wrong or more precisely if I did something wrong.

For testing I created a virtual machine with 4 disks and a RAIDZ1 installation, removed a disk after the first system boot and did precisely this:

Powered down the VM; deleted the virtual disk. Powered the machine back up and checked for the DEGRADED pool status. I then set the disk offline:

Code:
# zpool offline zroot {id number from zpool status}
# halt

I added a new disk to the VM and booted the system. Once online:

Code:
# gpart backup da1 | gpart restore da0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
# reboot

I rebooted the machine so that the partitions would show up under /dev. Once the machine was back online:

Code:
# zpool replace zroot {id number from zpool status} da0p2

Once resilvering was done I rebooted and all was well. For future reference, am I missing anything important for replacing a disk? I noticed there's swap partitions and a freebsd-boot partition on each disk in the raidz1 array. Are they mirrored and are there extra steps I need to take in the restore process? This is a vanilla FreeBSD 10 ZFS on Root install via the full install DVD for amd64 systems.
 
Back
Top