Failed pool hangs system

Quartz · Mar 20, 2013

I'm trying to set up a file server (9.1 release) but I'm having a semi-fatal problem dealing with pools:

I have a raidz2 comprised of six sata drives connected via my motherboard's intel southbridge sata ports. All of the bios raid options are disabled and the drives are in straight AHCI mode (hotswap enabled). The system (accounts, home dir, etc) is installed on a separate 7th drive formatted as normal ufs, connected to a separate non-Intel motherboard port.

As part of my initial stress testing, I'm simulating failures by popping the SATA cable to various drives in the 6x pool. If I pop two drives, the pool goes into 'degraded' mode and everything works as expected. I can zero and replace the drives, etc, no problem. However, when I pop a third drive, the machine becomes VERY unstable. I can nose around the boot drive just fine, but anything involving i/o that so much as sneezes in the general direction of the pool hangs the machine. Once this happens I can log in via ssh, but that's pretty much it. I've reinstalled and tested this over a dozen times, and it's perfectly repeatable:

[cmd=]ls[/cmd] the dir where the pool is mounted? Hang.
I'm already in the dir, and try to [cmd=]cd[/cmd] back to my home dir? Hang.
[cmd=]zpool destroy[/cmd]? Hang.
[cmd=]zpool replace[/cmd]? Hang.
[cmd=]zpool history[/cmd]? Hang.
[cmd=]shutdown -r now[/cmd]? Gets halfway through, then hang.
[cmd=]reboot -q[/cmd]? same as shutdown.

The machine never recovers (at least, not inside 35 minutes, which is the most I'm willing to wait). Reconnecting the drives has no effect. My only option is to hard reset the machine with the front panel button. Googling for info suggested I try changing the pool's "failmode" setting from "wait" to "continue", but that doesn't appear to make any difference. For reference, this is a virgin 9.1-release installed off the dvd image with no ports or packages or any extra anything.

Can someone help me out here? Is this a bug or something? I don't think I'm doing anything wrong procedure wise. I fully understand and accept that a raidz2 with three dead drives is toast, but I will NOT accept having it take down the rest of the machine with it. I can't even nuke the damn pool and start over without taking the whole machine offline.

Also, apologies if there's already a thread about this- forum search appears to be broken at the moment and I didn't see anything when I hand searched.

Beeblebrox · Mar 20, 2013

It's possible that the problem you are having is an about "HDD's in the array claiming to be identical in size, when actually not".

The size of new_device must be greater than or equal to the minimum size of all the devices in a mirror or raidz configuration. Thus, if your replacement HDD is just 1 sector smaller than the original, you cannot use it.

This problem exists for even identical model HDD's (they can vary in real size), so the idea is to not give ZFS the HDD as raw, but to partition each HDD to exact identical size.

Although there are several information sources, this is the one I was able to find for the time being. You should be able to figure out the general concept:
http://www.freebsddiary.org/zfs-with-gpart.php

Quartz · Mar 20, 2013

Beeblebrox said:
It's possible that the problem you are having is an about "HDD's in the array claiming to be identical in size, when actually not".

No, although I'm aware of that issue, it has nothing whatsoever to do with what I'm experiencing.

Maybe I wasn't clear enough in my first post: If I remove too many drives from a pool (to the point where it's no longer solvent), trying to do much of anything causes the whole machine to hang. Replacing drives (with potentially smaller ones) is not the issue- I can't even get that far.

Beeblebrox · Mar 20, 2013

If I remove too many drives from a pool (to the point where it's no longer solvent), trying to do much of anything causes the whole machine to hang. Replacing drives (with potentially smaller ones) is not the issue- I can't even get that far.

I did get all of that. My point was that all HDD's in the array are subject to this rule and must be prepared accordingly before the zfs-raid is setup. If the HDD's are not prepared in advance as described, removal of "faulty HDD" will cause lockup of host because it will be unable to resilver as dictated by ZFS. Replacement of the HDD will never become an issue because resilver will not be able to complete.

Just clarifying even though you are aware of the problem.

Quartz · Mar 20, 2013

Beeblebrox said:
I did get all of that.

Err... I'm not sure you did....

Beeblebrox said:
removal of "faulty HDD" will cause lockup of host because it will be unable to resilver as dictated by ZFS.

Resilvering doesn't have anything to do with this. It will be unable to resilver anyway because I pulled three drives from a raidz2- it completely doesn't matter how they were prepared or partitioned or what sizes they are/were because the pool is already toast by that point. And besides, if I only pull two drives the pool hums along just fine in 'degraded' mode: there's no resilvering happening there either if I don't replace any of the drives.

ab · Mar 21, 2013

What is acceptable?

This may well be an interesting test case and I am curious...

Quartz said:
I fully understand and accept that a raidz2 with three dead drives is toast, but I will NOT accept having it take down the rest of the machine with it. I can't even nuke the damn pool and start over without taking the whole machine offline.

Given your scenario: the ZFS dataset is toast because a critical chunk is missing following the third failure. You are not able to recover all of the data even if you were able to start a resilver with a new drive inserted. If I understand correctly, you expect that the seventh OS-only, UFS formatted drive should continue to maintain a stable operating environment nonetheless.

Would you next attempt to rebuild a new pool with the remaining drives to keep the system online?

Even if instability after losing three drives forced a restart it seems to me that the only viable recovery plan begins with insertion of one or three backup drives into the array. [presumably this works as you are adding back in the disconnected drives and resuming test operations successfully]

In other words: what value is uptime if your data set is thrashed?

wblock@ · Mar 21, 2013

The goal appears to be a graceful disconnect when the pool loses too many components to survive. That seems like the desirable way for a pool to fail, trying not to interfere with the rest of the system.

Quartz · Mar 21, 2013

ab said:
the ZFS dataset is toast because a critical chunk is missing following the third failure. You are not able to recover all of the data even if you were able to start a resilver with a new drive inserted.

Correct.

ab said:
If I understand correctly, you expect that the seventh OS-only, UFS formatted drive should continue to maintain a stable operating environment nonetheless.

Correct.

ab said:
Would you next attempt to rebuild a new pool with the remaining drives to keep the system online ?

Maybe. That depends on the specific failure I'd actually see in real-world usage and whether it looked like the drives were salvageable.

ab said:
In other words: what value is uptime if your data set is thrashed ?

This machine will eventually have more than one pool in it and be running other services, so it would be nice if one dead pool didn't take down the entire system. If I lose a pool, well, it sucks to be me.... but I don't want to have to worry about it hosing other unrelated stuff too.

Quartz · Mar 21, 2013

wblock@ said:
The goal appears to be a graceful disconnect when the pool loses too many components to survive.

Basically. A graceful something at any rate. As it stands, I can't even reliably figure out what happened to the pool or what state it's in.

ab · Mar 23, 2013

Quartz said:
This machine will eventually have more than one pool in it and be running other services, so it would be nice if one dead pool didn't take down the entire system.

Of course. This should have been obvious to me. Alas, my pools are solitary and I spoke from that perspective.

It seems you are getting some feedback via Questions@ and File-Systems...

Please consider posting here regarding diagnostic progress or any resolution. This issue is curious and somewhat unexpected. A satisfactory solution is likely to be informative and useful.

Quartz · Mar 23, 2013

ab said:
It seems you are getting some feedback via Questions@ and File-Systems...

Well, the conversation was moved from -questions to -fs, but other than asking for a dmesg no one has had any ideas or answers yet. I'm pretty sure this problems isn't specific to my hardware anyway; I've seen a few comments about it from the solaris people too. ie: http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFailmodeProblem

I'm not sure how much overlap there is between this forum and the lists, so if anyone reading this has any ideas I'd be happy to hear them.

ab said:
Please consider posting here regarding diagnostic progress or any resolution.

I will if I ever get one. It seems to be a known issue, but it also seems that most people either don't notice or don't care, so I don't have high hopes unfortunately.