[HAST] Preventing split-brain on simultaneous node failures

Sfynx · Mar 27, 2013

I'm looking into creating a highly available SAN setup with HAST and ZFS, and I'm trying to find a solution for the following scenario:

secondary HAST node dies --> primary starts to accumulate dirty writes
primary HAST node dies as well before secondary comes back up --> SAN is highly unavailable
secondary HAST node returns.

The (now unavailable) primary node still needs to send stuff to the secondary, so until then we cannot promote the secondary to primary... or else a split-brain occurs.

How can I figure out reliably whether or not I still need to wait for incoming writes before making myself primary, in case the other node is not available at that point? I cannot rely on CARP information in this scenario because it will simply set the interface to MASTER even though storage-wise we cannot become master yet.

AndyUKG · Mar 28, 2013

Hi,

I think if you know of or have experience of enterprice clustering solutions like Veritas Cluster then by comparison HAST and CARP are never going to provide you with a solution of a similar level of robustness and funcionality. I don't think there is any concept of quorum and certainly not IO fencing.
So basically it may be good enough for some in some circumstances but it isn't really a full featured cluster solution AFAIK,

cheers Andy.

OH · Mar 28, 2013

Never might be too strong a word. I don't know how far along fullsync replication method is, but with that and some kind of arbitrator function on a third machine, you'd have a pretty bulletproof setup.

[HAST] Preventing split-brain on simultaneous node failures

Sfynx

AndyUKG

OH