Question about HAST syncing

Wasn't sure if I should start a new thread for this, but I've run into some confusion as to how HAST synchronizes.

In the wiki (http://wiki.freebsd.org/HAST) it seems to indicate that HAST will simply figure out which is the best way to synchcronize and that might be primary->secondary, or secondary->primary, and of those either regular or full sync will be selected. These choices are based on the localcnt and remotecnt of dirty extents (ie modified data).

However, in the man page for hasd it says:
The connection between two hastd daemons is always initiated from the one
running as primary to the one running as secondary. When primary hastd
is unable to connect or connection fails, it will try to re-establish
connection every few seconds. Once connection is established, primary
hastd will synchronize every extent that was modified during connection
outage to the secondary
hastd.

The last line in bold is the key and seems to indicate that sync only happens primary->secondary.

It seems intuitive that the "freshest" data should always reside on the Primary. So, my primary goes down, the secondary switches to primary, and it will have new data. I fire up the old primary and it should be set as secondary, then receive the changes from the current primary. I can then switch roles back if I so desire.

But if the WIKI is right, and a sync from secondary is possible, then the roles don't seem to play a part and so long as one is primary and the other secondary, hastd will figure it out.

Is this wishfull thinking?

Also, is there away to tell when synchronization is complete and, if applicable, which direction it is going? Update: yes, hastctl status for "complete" or not. Direction though? Does it matter?
 
HAST always syncs from the primary node to the secondary node.

HostA starts as primary. HostB starts as secondary. Bunch of data is copied to the system and flows from primary to secondary.

HostA goes does. HostB now becomes primary. Bunch of data is copied to the system.

HostA comes online as secondary. HostB syncs the changed data to HostA (data flows from primary to secondary). You should not switch roles until the sync is complete.

Once the sync is complete, you can decide to switch roles, or leave HostB as primary.

This one of the reasons CARP is so important. That way, it really doesn't matter which system is "primary", so long as only 1 system is ever "primary" at a time.
 
Great that makes much more sense, but also introduces a problem.

From what I've read, the carp master sends out a heartbeat and the slave/peers listen. If they stop hearing it, then one will become the master. Simple enough.

But let's say both nodes are down. One was a master, the other was a slave.

When I start them up again, the first one up will become the master b/c it won't hear any other master broadcasting. The second one up will become the slave because it will hear an existing master.

This means I really have to start them in the right order. And after a power failure or something like that, it's just not guaranteed, especially if I don't know which was master and which was slave before they went down. Not an unlikely scenario.

And so if they are started in the wrong order, say the old-slave first, which becomes the new master, and the old-master second, which becomes the new slave, and if hast only syncs from primary->secondary, then I'm in trouble.

I guess I could touch an "im-the-slave" file somewhere and check for that before assigning roles. In the HAST wiki though, it talks about localcnt and remotecnt dirty block counts to determine scynchronization.


Then is there anyway to check this directly from hast? To see which has the newest data so that I can determine which "should" become primary on startup?

thanks! -p
 
sync broken

Well, I just updated to the lastest source in 8-STABLE as of 06/08 and HAST replication is broken, more or less.

Here's the setup:
1. from a "working" hast primary/secondary system
2. stop hastd on the secondary (change role to init)
3. let primary crank along and accumulate dirty non-sync'd data
4. start hastd on the secondary, ensuring role is secondary
5. what happens? NADA! not for a while, then SLOWLY it starts syncing, like 2MB/s

Now, just to say, this used to work fine. I had up on 8.2-RELEASE, though HAST was unstable, which was the reason for tracking STABLE.

I can confirm excellent throughput just transferring a file normally across the links that hastd is using. This is also a direct link, so I get up to 100MB/s otherwise.

There doesn't seem to be anything meaningful in the messages log.

One thing I noticed is that the default replication method has changed from "memsync" to "fullsync". I know in the manual it says memsync was not implemented, but up until recently memsync was actually the default and fullsync wasn't supported.

If anyone has any suggestions, please let me know.

thanks!

OK, I found a kernel patch posted by Mikolaj Golub, applied it and FIXED! I've attached the patch, found it in the FreeBSD STABLE digest vol 412 issue 4. Golub says, "The patch was committed to current (r222454)".

Anyway, hope that helps others =)
 

Attachments

Back
Top