replacing disks in RAIDZ2 pool: resilvering completes, array remains degraded

brainsalad · Nov 11, 2009

I'm running FreeBSD 8.0-RC1. Two disks in my 12x500 double parity ZFS array died, leaving the array in degraded mode.

I installed two new 500GB disks and used the zpool replace command twice, supplying the pool name, a failed device name, and a new device name each time.

Resilvering completed after about 15 hours. However, at this point, zpool status reported that the replace operations were still in progress. I subsequently scrubbed the pool successfully and cleared the error counts. The current output of zpool status:

Code:

pool: chunk
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        chunk                      DEGRADED     0     0     0
          raidz2                   DEGRADED     0     0     0
            replacing              DEGRADED     0     0     0
              ad14                 OFFLINE      0     0     0
              ad38                 ONLINE       0     0     0
            replacing              DEGRADED     0     0     0
              7415432913300468315  REMOVED      0     0     0  was /dev/ad6/old
              ad34                 ONLINE       0     0     0
            ad12                   ONLINE       0     0     0
            ad10                   ONLINE       0     0     0
            ad4                    ONLINE       0     0     0
            ad36                   ONLINE       0     0     0
            ad40                   ONLINE       0     0     0
            ad30                   ONLINE       0     0     0
            ad18                   ONLINE       0     0     0
            ad42                   ONLINE       0     0     0
            ad16                   ONLINE       0     0     0
            ad32                   ONLINE       0     0     0

errors: 194225 data errors, use '-v' for a list

Attempting to manually detach one of the failed devices does not succeed:

Code:

sudo zpool detach chunk ad14
cannot detach ad14: no valid replicas

Likewise, manually detaching one of the new devices also fails:

Code:

sudo zpool detach chunk ad38
cannot detach ad38: no valid replicas

However, if I boot the system with ad34 and ad38 disconnected, the array remains accessible in degraded mode.

Any ideas as to how I might fix this? I'm considering zeroing one of the new drives and attaching it to the array in the hope that after resilvering completed, I would be able to detach one of the stale, failed devices.

Is the problem perhaps that I attempted to replace two drives in a double parity array simultaneously? Any ideas?

mix_room · Nov 11, 2009

If I understood correctly then losing two drives from a RAIDZ2 pool is like losing one disk from a two-disk mirror. You don't lose any information, but you lose the ability to self-heal data. Since you no longer have any data-redundancy the pool cannot know which data is correct and which isn't, and is therefore degraded.

Restore from backup.

brainsalad · Nov 12, 2009

mix_room said:
Since you no longer have any data-redundancy the pool cannot know which data is correct and which isn't, and is therefore degraded.

Restore from backup.

I version my data and maintain four duplicates of the repository in different physical locations, so I'm safe on that front.

It is not true that losing one device from a single parity array or two devices from a double parity RAIDZ array leaves the array in degraded mode until it is destroyed and rebuilt. I tested both of these scenarios when initially evaluating ZFS. ZFS stores CRC or ECC (the user may select the algorithm, I employ SHA-256) for every block; so it is never the case that integrity of a block is unknown.

However, I think you are on the right track. The zpool man page states that an array may be marked degraded if:

Code:

The number of checksum errors exceeds acceptable levels and the device is degraded as an indication that something may be wrong. ZFS continues to use the device as necessary.

Some 194k ECC errors are reported for the array; this may well exceed some limit, causing the array to be marked degraded forever.

However, the man page does not indicate that the replace operations should also fail in this circumstance. Resilvering entailed by replacement completes, but the replacement itself is never marked complete. Perhaps the high number of ECC errors prevents the replace operations from ever nominally completing; if so, this behavior seems to be undocumented.

I'm going to try pulling one of the other disks to see if the array remains accessible, and if so, I will try scrubbing the array to see what happens.

Obviously, my interest here is the behavior of ZFS/RAIDZ. Failure of an entire array has no impact on the online availability of our resources as we expect things such as file systems and operating systems to fail occasionally and are rarely disappointed in that regard.

replacing disks in RAIDZ2 pool: resilvering completes, array remains degraded

brainsalad

mix_room

brainsalad