ZFS pool faulted after single disk failure

clpollock · Sep 18, 2011

A few days ago I decided to convert my home server from Ubuntu Linux to FreeBSD 8.2 so that I could use ZFS, for data integrity. I am aware that ZFS is available on Linux, but from what I've read the FreeBSD implementation is better.

I installed FreeBSD onto a mirrored ZFS pool, and created a raidz2 pool with 5 2TB drives (2 Seagate, 2 Western Digital, 1 Samsung) for my data. I deliberately mixed manufacturers to guard against the possibility of a bad batch from the same factory.

It worked fine for about a day, when my system suddenly locked up, and I was forced to do a hard power down. When I tried to restart the system, it always hung in the same place. Eventually I suspected a bad drive, so I disconneded every drive except one member of the mirrored pool, after which the system was able to start. With a little trial and error, I discovered that one of the Western Digital drives was the culprit - the system booted fine with all the other disks reattached.

The problem was that my 5 disk raidz2 pool was reported as faulted, although only a single disk was unavailable, and I could find no way to fix it. Eventually I gave up and recreated the pool, using another Seagate drive in place of the failed Western Digital. I wasn't happy trusting my data to a filesystem that could be taken out by a single drive failure, so I decided to do some experiments.

I tried swapping disks between SATA ports, but this didn't seem to cause any problems. I disconnected 1 or 2 disks, but the pool remained available in a degraded state.

To reproduce what had happened to my earlier pool, I disconnected the entire pool and reconnected only some of the drives. I discovered that the pool sometimes became available with 1 or 2 missing disks, but other times a single missing disk was enough to put it in a faulted state. It seemed to depend on which disk was missing. Here is an example:

Code:

[pollochr@Warspite ~]$ zpool status tank
  pool: tank
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: [url]http://www.sun.com/msg/ZFS-8000-3C[/url]
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        FAULTED      0     0     1  corrupted data
          raidz2    DEGRADED     0     0     6
            ad6     ONLINE       0     0     1
            ad4     ONLINE       0     0     0
            ad10    ONLINE       0     0     0
            ad16    ONLINE       0     0     0
            ad18    UNAVAIL      0     0     0  cannot open

I shut down the system, reattached the disk that was missing above, detached 2 different disks, and restarted. I now see this outout:

Code:

[pollochr@Warspite ~]$ zpool status -v tank
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: [url]http://www.sun.com/msg/ZFS-8000-2Q[/url]
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        DEGRADED     0     0     0
          raidz2    DEGRADED     0     0     0
            ad6     UNAVAIL      0     0     0  cannot open
            ad4     REMOVED      0     0     0
            ad10    ONLINE       0     0     0
            ad16    ONLINE       0     0     0
            ad18    ONLINE       0     0     0

errors: No known data errors

My understanding is that raidz2 should be able to cope with the loss of any 2 drives without losing data. I can't understand what's going on. I've searched online and seen a few examples of other people having similar problems, but no clear explanation of the cause.

Any help would be much appreciated.

Goose997 · Sep 18, 2011

Hi

I am not sure if I am on the right track, but if you disconnect drives and then reconnect those drives, the pool will start resilvering the reconnected drives. If during this process you decide to remove another drive, I am not sure what will happen because I don't think there will be enough redundant information left to rebuild the pool.

regards
Malan

da1 · Sep 18, 2011

Goose997 said:
Hi

I am not sure if I am on the right track, but if you disconnect drives and then reconnect those drives, the pool will start resilvering the reconnected drives. If during this process you decide to remove another drive, I am not sure what will happen because I don't think there will be enough redundant information left to rebuild the pool.

regards
Malan

Normally there has to be, because a raidz2 can withstand 2 hdd failures, no mater the way the fail. The situation above is quite strange and I will try to reproduce it as well.

To the OP: what platform are you using ?

clpollock · Sep 19, 2011

My system specifications are as follows:

AMD Athlon LE 1640 (single core, 2.6 GHz)
4GB DDR2 800 ECC
Asus motherboard with ATI chipset (can't remember the exact model) - 5 SATA II ports on the motherboard.
Silicon Image 3132 card with 2 SATA II ports

The power supply is a Seasonic of around 500 watts, which should be more than adequate.

The disks in my first RAIDZ2 pool were 2 x Seagate ST2000DL003, 2 x Western Digital WD20EARX, and 1 Samsung HD204UI. For my second, experimental pool I swapped 1 of the Western Digitals for another Seagate.

For what it's worth, I did shutdown the system properly before attaching or detaching drives.

I discovered another strange thing - I decided to give the apparently defective Western Digital drive another test before returning it, and found that it was working fine. Either the drive has an intermittent fault, or the hanging was due to some other problem, such as the ZFS information on the disk being corrupt. Whatever the problem was with this drive, it can't have affected my later test results because I removed it before creating the second pool.

For now I've decided to give up on raidz2 and create 3 separate raidz mirrors with 2 x 2TB drives each.

I'd still love to hear an explanation for my raidz2 problems. It's extremely unsettling to learn that you can so easily lose your data with what it supposed to be an extremely robust filesystem.

mbrumlow · Sep 20, 2011

The only thing that sounds wrong is when you said you disconnected ALL but one drive of the mirror.. "so I disconneded every drive except one member of the mirrored pool". When you start up with one member drive 1 of 5 would make the latest seen drive as a failed ZFS. You might have been able to fix it because you had raidz2 if you had put the other 3 (non failed) drives back in with out the single one you had already marked as failed...

IDK. I am just making stuff up...

xibo · Sep 20, 2011

I discovered another strange thing - I decided to give the apparently defective Western Digital drive another test before returning it, and found that it was working fine. Either the drive has an intermittent fault, or the hanging was due to some other problem, such as the ZFS information on the disk being corrupt. Whatever the problem was with this drive, it can't have affected my later test results because I removed it before creating the second pool.

Might be related to this (linux)

gkontos · Sep 20, 2011

@clpollock,

don't give up on raiz2 yet! Here is what you should do:

First, label your disks before attaching them to the pool. Create the raidz2 pool with the five disks (in the same controller), put some data in there, shutdown your system and remove one disk. You should be able to use your raid in a degraded state. Shutdown your system again and remove another disk. Again, the raid should be still functioning in a totally degraded state!
Now, shutdown the system, attach the drives, relabel them, and use ZFS to properly replace them. You should be having a fully functional pool again.
If you encounter any problems during the procedure please post them here in detail.

George

phoenix · Sep 26, 2011

clpollock said:
For now I've decided to give up on raidz2 and create 3 separate raidz mirrors with 2 x 2TB drives each.

Does not compute. There's no such thing as "raidz mirrors". You either create raidz vdevs, or you create mirror vdevs.

peetaur · Sep 29, 2011

Don't give up, but certainly don't make the same error again with mirrors.

I had a very similar problem when I set up a raidz2 without labels. After reboot, the disk numbers all changed, and instead of being DEGRADED, it was UNAVAIL or something else that wouldn't be an online file system (unlike your example). The disk numbers moved in such a way that the disk I assigned as a spare ended up being in the pool. Another time when the disks moved around but all were still in the pool, there was no problem.

So regardless of whether you want mirrors, raidz, etc. take gkontos' advice and use labels. Even with mirrors, if you don't use labels, you will likely run into the same problem again.

For example, assuming your disks have lights per disk, rather than a single light for all disks:

First find out which disks have GPT tables, so you don't try to label them, or maybe to know early on if you need to clear them first.
# gpart show

Cause the light on a disk to turn on. Change "da0" here to the correct device to test. (Do not put anything other than /dev/null for of= or you may destroy data)
# dd if=/dev/da0 of=/dev/null

Watch the lights and decide on a label for it, then hit ctrl+c. Then (also replacing da0 with the correct disk):
# glabel label disk1 /dev/da0

Make sure you didn't screw up your labels (having duplicates, or missing one):
# ls /dev/label

Then when you create your zpool, use the label. eg:
# zpool create tank raidz2 label/disk1 label/disk2 label/disk3 label/disk4 label/disk5