Other HAST+CARP Failover Cluster Storage

joebell · Feb 9, 2015

Hi,

My goal is it to have two nodes working in a cluster and providing HA iSCSI targets for virtual machines. Success has been achieved on two FreeBSD 9.3 nodes to configure ZFS+HAST+CARP+iSCSI, but the failover is behaving strange to my opinion. Here is what it looks like:

Initial configuration:
Node 1- HAST role: primary, Status: complete, CARP: MASTER
Node 2- HAST role: secondary, Status: complete, CARP: BACKUP

Now I want to check the failover functionality:
1) Restart BACKUP node – here is the state after it comes up

a. Node 1 – HAST Role: primary, Status: degraded, CARP: MASTER
b. Node 2 – HAST Role: INIT, Status: - , CARP: BACKUP
c. I need to put the Node 2 into HAST secondary role manually
d. Initial configuration achieved

2) Restart MASTER node – here is the state after it comes up

a. Node 1 – HAST Role: INIT, Status: - , CARP: BACKUP
b. Node 2 – HAST Role: secondary, Status: -, CARP: MASTER
c. I need to put the Node 1 into HAST primary role manually
d. Initial configuration achieved

As I was expecting that after the reboot of either node the failover storage cluster would automatically achieve the initial state, I think there must be something wrong either with my configuration, or is there a bug in the hast-carp-switch script I had got from this link?

I would be glad for any comment and/or help.

joebell · Feb 11, 2015

Hm, no one had created storage failover cluster like I did?

User23 · Feb 11, 2015

Isn't port sysutils/heartbeat doing that job?

gkontos · Feb 11, 2015

joebell said:
As I was expecting that after the reboot of either node the failover storage cluster would automatically achieve the initial state, I think there must be something wrong either with my configuration, or is there a bug in the hast-carp-switch script I had got from this link?

I would be glad for any comment and/or help.

I have not used HAST for a while and I would discourage you to use it with ZFS. Regarding the switch to the initial state, I am not sure if this is something that you really want. Imagine for example that Node1 (master) has a problem and it reboots for some reason. You would not want this node to become primary until you fix the problem....

joebell · Feb 11, 2015

User 23

Nice to have a reaction. I can give Heartbeat a try. Many thanks for your advice.

joebell · Feb 11, 2015

qkontos

That's right and many thanks for your comment. Maybe my question was not formulated correctly. English is not my native language. What I was expecting was that in case the Node1 being a master reboots, the Node 2 (slave) will have the resources in degraded status. But it has a null status exactly the same way as it has it in my scenario #1 when it rebooted. Why does the cluster behave other way in the other case? I would be satisfied, if the states of the nodes are the same in both scenarios except the roles are switched, i.e. when master reboots, the backup node has its resources in degraded state and vice-versa. Is this idea far off?

joebell · Feb 11, 2015

OK guys. I quit.

This is no good solution for me to configure the HAST+CARP+ZFS+iSCSI Target Failover Storage Cluster in FreeBSD. Besides those strange statuses after a cluster node is rebooted, that could manually be managed however, the most important is the fact that the iSCSI Target service does not fail over at all. How can I achieve a HA storage for a Hyper-V node, if the iSCSI service is not highly available? No idea by now, except of buying expensive commercial solutions I wated to avoid.

If someone among you still knows the answer, it would be great if it could be shared.

Anyhow, many thanks to all of you who paid attention to my post and were trying to help.

gkontos · Feb 11, 2015

joebell said:
qkontos

That's right and many thanks for your comment. Maybe my question was not formulated correctly. English is not my native language. What I was expecting was that in case the Node1 being a master reboots, the Node 2 (slave) will have the resources in degraded status. But it has a null status exactly the same way as it has it in my scenario #1 when it rebooted. Why does the cluster behave other way in the other case? I would be satisfied, if the states of the nodes are the same in both scenarios except the roles are switched, i.e. when master reboots, the backup node has its resources in degraded state and vice-versa. Is this idea far off?

Because as far as I know, you need to use a script, triggered by a DEVD event, that will actually make the resources available to the other node. See here for an example.

The problem however, with HAST and ZFS is this. HAST is only a network mirror of 2 resources (disks/controllers) on 2 different nodes. So a pool is actually created by those mirrored resources. Now, imagine that you have a RAIDz1 pool consisted by 3 resources and in your active Node one disks goes bad. You would expect that a zpool status reflect that problem by showing the pool to be in a degraded state. It will not! Why? Because the network resource is degraded but ZFS has no idea of knowing that.

usdmatt · Feb 11, 2015

There may be a few people who have got it working properly but they are few-and-far between. It's not particularly well supported on the forums as it's not that easy a thing to get right. You have to write your own scripts to handle all the switching of hast roles, as well as bringing services back online.

Whenever I've looked at it, the deeper I've got, the more worried I've become at all the possible edge cases, and I usually just decide to have a backup and switch manually. Writing scripts to HA switch storage and services in the event of failures is not something that you can get right without a lot of work.

Having said that, how are you managing the hast and zpool/iscsi switching? I assume you're aware that you need a script to do the switchover, similar to the one at https://www.freebsd.org/doc/handbook/disks-hast.html.
That sample script there just switches the hast role, putting whichever machine is the carp master into the hast master. You will also need that script to start iscsi, as the iscsi service will probably not work correctly until the ZFS pool arrives. So basically the script needs to do the following when carp goes master:

Go into master mode for hast
Forcefully import the zpool (need to check errors and give up/send help! email if import fails)
The zvols used for iscsi should hopefully appear at this point
Start the iscsi service. You'll probably need to disable iscsi on boot and use the onestart option as you don't really want iscsi running unless the machine is the master.

This all relies on the iscsi service actually supporting the ability to switch from one node to another. I know HA won't work with NFS on FreeBSD as there is no way to force both nodes to use the same NFS file system ID numbers, meaning you have to unmount and re-mount the clients anyway. I'm sure it *should* be possible with iscsi though, but as mentioned you do have to do some work to make sure the service runs correctly, and only runs when the ZFS pool is available.

Apologies if you've already covered all this, but you don't mention anything about your scripts or configuration.

joebell · Feb 12, 2015

gkontos said:
Because as far as I know, you need to use a script, triggered by a DEVD event, that will actually make the resources available to the other node. See here for an example.

I realized that in the meantime after the carp-hast-switch script had been inspected. Thanks for directing me to that site I knew before my experiments begun.

joebell · Feb 12, 2015

usdmatt

Many thanks for this post. I will investigate my scripts, of course. I do really appreciate the support and help of all of you, guys.

joebell · Feb 12, 2015

The problem however, with HAST and ZFS is this. HAST is only a network mirror of 2 resources (disks/controllers) on 2 different nodes. So a pool is actually created by those mirrored resources. Now, imagine that you have a RAIDz1 pool consisted by 3 resources and in your active Node one disks goes bad. You would expect that a zpool status reflect that problem by showing the pool to be in a degraded state. It will not! Why? Because the network resource is degraded but ZFS has no idea of knowing that.

Many thanks for this explanation. I see I have to dig into the HAST and ZFS more. Your comment is much appreciated. Well, I thought it would be much easier to achieve my goal. I wanted to stop configuring HAST, CARP, ZFS and iSCIS, but the post from usdmatt encouraged me, so I will try to crack the nut.

joebell · Feb 12, 2015

Having said that, how are you managing the hast and zpool/iscsi switching? I assume you're aware that you need a script to do the switchover, similar to the one at https://www.freebsd.org/doc/handbook/disks-hast.html.

I have been investigating my configuration in a more detailed fashion and realized I used that script posted as you mention. You are completely right. Of course, I forgot that the script had no iSCSI section to make it exactly the way as you are describing. Your post encouraged me a lot to stick with it and continue. Hopefully, I will achieve my goal, though.

joebell · Feb 15, 2015

gkontos said:
I have not used HAST for a while and I would discourage you to use it with ZFS. Regarding the switch to the initial state, I am not sure if this is something that you really want. Imagine for example that Node1 (master) has a problem and it reboots for some reason. You would not want this node to become primary until you fix the problem....

May I draw your attention to this site - NAS4Free? I created a testing lab. Both nodes behave the way as expected in my first post. Here is how they operate with ZFS volumes and iSCSI targets created upon it:

Initial state
- Node 1 - HAST role primary-status complete, iSCSI target service running-status on
- Node 2 - HAST role backup-status complete, iSCSI target service running-status off
- my Microsoft Failover Hyper-V Server 2012 virtual machines are running on the Node 1 right now
Node 1 up, Node 2 down
- Node 1 - HAST role primary-status degraded, iSCSI target service running-status on
- my Microsoft Failover Hyper-V Server 2012 virtual machines are running on the Node 1 right now
Node 1 down, Node 2 up
- Node 2 - HAST role primary-status degraded, iSCSI target service running-status on
- my Microsoft Failover Hyper-V Server 2012 virtual machines are running on the Node 2 right now
Node 1 up, Node 2 up (Initial state with switched roles is recreated after Node 1 rebooted)
- Node 1 - HAST role backup-status complete (needs to be done manually with hastctl to achieve status "complete"), iSCSI target service running-status off
- Node 2 - HAST role primary-status complete, iSCSI target service running-status on
- my Microsoft Failover Hyper-V Server 2012 virtual machines are running on the Node 2 right now
The roles can easily be switched in the web GUI. The iSCSI target failes over as well, e.g. to achieve the state in the point 1., it is now just necessary to bring the Node 2 into BACKUP role, and the Node 1 becomes MASTER again automatically. The HAST resources will become primary-complete on Node 1, and the iSCSI target service will be transferred from Node 2 to Node 1 automatically, as well. The resources on Node 2 are showing status backup-complete.

Well, that is the correct behavior I was expecting when I tried to configure the HA storage cluster in FreeBSD v9.3 manually. By the way, The NAS4Free project is based on FreeBSD, hence my belief there is a way to be successful with FreeBSD, if you do not require the web GUI to administer both nodes (which I don't require, either). The NAS4Free can serve as inspiration and guide (for me and all others). Now I am just going to tweak and improve the cluster storage to operate in a best possible way. After that I will decide, whether NAS4Free would be implemented in a production environment.

Thanks for following this thread, and again my best thanks to all who were helping me find a good solution for my project.

gkontos · Feb 15, 2015

joebell said:
May I draw your attention to this site - NAS4Free? I created a testing lab. Both nodes behave the way as expected in my first post. Here is how they operate with ZFS volumes and iSCSI targets created upon it

My objection to the set up is purely based on the use of ZFS with HAST. I tried to explain the reasons so repeating them is just a waste.

As far as NAS4Free is concerned, if you dig a bit, you will find out that there are many scripts executed upon DEVD events. You can explore them and use them in your own vanilla installation.