ZFS ZFS & HAST with different disk configurations?

`Orum

Well-Known Member

Reaction score: 52
Messages: 297

I've got two servers, which for ease of communication we'll call primary and secondary. Currently they're both running ZFS on root, and syncing often via zfs send/recv. However, I'd like to convert over to HAST, for faster failover and to avoid rolling back however many minutes its been since the last syncronization. Normally this is fairly simple as the handbook covers the basics, but my situation is a bit different as the servers have different disk configurations.

The primary's pool has an n-way mirror (for performance and fault tolerance), while the secondary is just a single disk. My naive initial thought on how to adapt this to HAST was:
  • Set up each disk with 3 GPT partitions: a boot partition, a root pool partition, and the "shared" pool (HAST) partition.
  • HAST is used to synchronize one disk's shared partition on the primary with the shared partition on the secondary.
  • Shared ZFS pool is created on the primary as an n-way mirror using the HAST partition and the other disk's partitions directly.
But I can think of more than a few problems this method has. Namely:
  • Pool on the secondary would probably always show "degraded" because it couldn't find the other phantom partitions it expects.
  • If primary goes down, changes are made to the secondary's HAST. When the primary comes online and gets these changes back, only one partition in the HAST mirror is updated, undoubtably causing problems for ZFS (as its locally mirrored partitions are not updated).
While I could work around this by creating three shared partitions on the secondary's single disk to use for HAST, that doubles/triples/etc. the data stored on the secondary's disk for little benefit and a heavy write performance penalty (similar to e.g. copies=2 for a 2-way mirror on ZFS). So I've been trying to think of other ways to mitigate the issue, but I'm not sure any of them are ideal or even possible. The things I've thought of:
  • Use gmirror(8) to create the mirror on the primary and hand that to HAST. I think the main disadvantage of this method is using gmirror instead of ZFS for mirroring is less than ideal performance-wise, but please correct me if I'm wrong.
  • Configure HAST to perform the mirror for the local partitions? I'm not sure this is even possible, and I believe it would have the same issues as gmirror.
I just may be barking up the wrong tree with HAST altogether, but FreeBSD doesn't offer any clustered filesystems. Is there some other way to achieve high availability without losing the features we all know and love in ZFS?
 

gnoma

Active Member

Reaction score: 22
Messages: 203

Usually the right way to do it only on top of zpool and ZVOLs.
Create ZVOL on the one system and the same size ZVOL on the other one. Sync volume to volume and don't care what's below - raid, or single disk.

I used HAST a lot until the moment when I discovered that it doesn't support online extend, it doesn't support offline extend, it doesn't support resize of HAST device at all.
Eventually you will run out of space on the HAST device and the only way to extend it is to backup your data, destroy it and create new HAST with more space, then restore the backed up data on it.

I moved to znapzend as a replication solution. It's not super cool and the ultimate solution of everything and it has it's disadvantages, but as for not it kinda does what I need it to do and it does it better than HAST.
 
OP
`Orum

`Orum

Well-Known Member

Reaction score: 52
Messages: 297

Create ZVOL on the one system and the same size ZVOL on the other one. Sync volume to volume and don't care what's below - raid, or single disk.
This is what I have now, and failover/recovery is awkward at best. The main advantage I see with HAST is that I can avoid the snapshot dance that has to take place to sync now.

Eventually you will run out of space on the HAST device and the only way to extend it is to backup your data, destroy it and create new HAST with more space, then restore the backed up data on it.
Ever tried to add a disk to a raid-z vdev? ;) Anyway, that doesn't bother me in this situation as for this application we're only talking about a few TB of data at most.

I moved to znapzend as a replication solution. It's not super cool and the ultimate solution of everything and it has it's disadvantages, but as for not it kinda does what I need it to do and it does it better than HAST.
I've not used this, but it looks similar to what I do now, which is a custom script I wrote to handle all the synchronization. In any case, it's the exact sort of thing I'm trying to move away from.

I think I'm going to set up a test environment and see how performance is with raw mirrors (zpool created right on the disk partitions, which is how it's set up now) vs zpool on top of HAST on top of gmirror. If I get close enough (>= 80% or so) to the raw performance it seems like the most straightforward way to do this.
 
OP
`Orum

`Orum

Well-Known Member

Reaction score: 52
Messages: 297

Trying to test this configuration today, and I've run into a problem. Putting a gmirror on top of gpt partitions costs you a few bytes as one would expect; in my case, a single 512B sector. However, when creating partitions within gpart, it seems partition sizes are aliased to much larger boundaries (128K in my case). This makes it impossible to have a GPT partition that's the same size as a virtual block device provided by gmirror—at least without some other magic between.

So my questions are:
  1. Does this mediasize discrepancy even matter? Everything I've read on HAST assumes you are using identical disks, but I can't find anywhere where it says that it won't simply allow using the smaller of two disks.
  2. Assuming it does matter and that the disks must be the same size, what's the best way to work around the problem? gnop(8)? A gmirror with only one disk (if that's even permitted)? The former seems like the ideal way to lop off one sector.
Edit: Using gnop to trim off a sector appears to work just fine, though I didn't test with different sizes. So I think ultimately things will have to look like this:
  • primary: raw disks -> gpt -> gmirror -> hast -> zfs
  • secondary: raw disk -> gpt -> gnop -> hast -> zfs
 
Top