HAST + ZFS: no action on drive failure

Pfarthing6 · Jul 1, 2011

So, I'm doing some failure testing, this is the first time I thought of just pulling out a drive to see what happens with the HAST/ZFS configuration.

There's something quite unfortunate: when a drive goes offline (like by pulling it out) HAST doesn't do anything, the HAST device stays up, so the zpool doesn't know it's offline.

Anybody have experience with this? A solution even? I am script capable, but not sure what I should do to handle it.

And one more thing that I just found out the FreeBSD implementation of ZFS doesn't seem to include the agent responsible for checking failures. As a test, I changed one of the HAST devs to INIT, so that the zpool could not see it. Sure enough it listed it as OFFLINE, but the hot spare that I had configured, did not engage.

The whole idea of a hot spare is to automate the replacement of a failed drive. That isn't happening, and I waited a while too.

...btw: autoreplace=ON in my zpool settings, witch is a raidz2.

thanks!

Pfarthing6 · Jul 1, 2011

I'm thinking that perhaps some sort of devd handler might help with this. I noticed in the devd.conf that there are handlers for some zfs events and of course, when I unplug/plug a drive, that generates and event too. But what system/subsystem/matches to configure it for? I've looked around, can't find any extensive list of events for such things.

So, I found some older posts on similar subjects and figured out how to monitor devd events using:[CMD=""]cat /var/run/devd.pipe[/CMD]

Then I removed and replaced a drive and got this output:

Code:

# output from removing a drive
!system=DEVFS subsystem=CDEV type=DESTROY cdev=pass5
!system=DEVFS subsystem=CDEV type=DESTROY cdev=ada5

# output from inserting a drive
!system=DEVFS subsystem=CDEV type=CREATE cdev=pass5
!system=DEVFS subsystem=CDEV type=CREATE cdev=ada5

So, it looks like I can write a devd config to handle this, like I've done for carp LINK_UP AND LINK_DOWN evens.

I noted however, that even plugging the drive back in, hast cannot start using the drive again until the resource for that drive has been set to INIT, then back to primary (or secondary as the case may be).

Now, can anyone comment on "other" events that I might look for in the case of a failed drive, or is this pretty much it?

I will of course be setting up smartd and also monitoring with Nagios. But other tips and tricks are welcome!

Pfarthing6 · Jul 6, 2011

So, as it turns out, the reason that hastd does not fail the resource for the configured hard disk on the primary host is that the secondary hast resource on the remote host immediately begins servicing requests upon a disk failure on the primary.

I found that I needed to think of the local and remote disks being a RAID1 mirror. When the local disk fails, hastd begins using the remote disk for all operations. Therefore, both local and remote disks would have to fail for the resource to fail, which would mean zfs would then notice and mark the drive as unavailable.

There is also a procedure recommended by Mikolaj Golub to handle a failed drive on the primary node which I am including below.

Of course, there still is no built-in daemon that will handle the "hot spare" attach for zfs in FreeBSD. But it seems this is not so necessary when the system is configured to use HAST.

How to Replace a Failed Drive on Primary

When you are reinserting the drive the resource should be in init state. Remember, some data was updated on secondary only, so the right sequence of operations could be:

1) Failover (switch primary [host] to init and secondary [host] to primary).

2) Fix the disk issue.

3) If this is a new drive, recreate HAST metadata on it with hastctl utility.

[e.g. hastctl create <resource_name>]

4) Switch the repaired resource to secondary and wait until the new primary
connects to it and updates metadata. After this synchronization is started.

5) You can switch to the previous primary before the synchronization is
complete -- it will continue in right direction, but then you should expect
performance degradation until the synchronization is complete -- the READ
requests will go to remote node. So it might be better to wait until the
synchronization is complete before switching back.

-- Mikolaj Golub, Freebsd-Stable Digest, Vol 416, Issue 1, #2

note: my minor additions to this procedure are in brackets.

SOLVED!

gkontos · Oct 1, 2014

Digging on an old thread after a discussion on the mailing lists.

I would like to know how others are dealing with this problem or at least if anyone is using HAST with ZFS.

da1 · Oct 2, 2014

Nagios check for HAST and also for ZFS.

PS: I've been reading the mailing list thread where you discuss this for the past days.

gkontos · Oct 2, 2014

da1 said:
Nagios check for HAST and also for ZFS.

PS: I've been reading the mailing list thread where you discuss this for the past days.

Ok! Could you also let me know your biggest configuration is? Did you ever need to replace any disks from the set up?

da1 · Oct 2, 2014

Hi,

Just a small configuration. 2x Pc's with 2 HDDs each and a crossover network cable between them. Each PC had a third and a fourth HDD with FreeBSD on it and ZFSonRoot. On top of the HAST resources, I have a zmirror. On one occasion, 1 HDD broke, I replaced it, HAST sync'ed everything back up again, I did a zpool export/ import and ran scrub. Cannot remember if ZFS detected the broken HAST resource or not.

gkontos · Oct 2, 2014

What we had in mind was 4 stripped vdevs of 7 disks. Each vdev in a raidz3 configuration. That sums up to a total of 28 disks. SSD's for caching but those would not be part of the HAST resources.