HAST and ZFS with CARP failover

gkontos · Apr 12, 2012

@tuaris,

What FreeBSD version are you running?

My HAST tests were conducted with FreeBSD 9.0-RELEASE.

I recently deployed a HA solution for a client who is using FreeBSD 8.2-STABLE (mid Jan). The backup server is using FreeBSD 9.0-RELEASE. Both listen on a CARP IP address and they share a mysql server in replication mode, apache and DNS in jails.

In that case I also got the same results during failover tests.

Regards,
George

phoenix · Apr 12, 2012

zennybsd said:
@gkontos: Thanks!

From what you said about CARP, it seems that HAST+CARP is good for storage scalability rather than redundancy, right?

No. HAST does not provide you any extra storage space, nor does it provide any extra storage speed. All HAST does is provide 2 copies of the data on two separate servers with the ability to switch which server is accessed for storage. Meaning, if one server fails, all access switches automatically to the other server and clients carry on like nothing happened.

Generally, for enterprise grade operations are done in at least two datacenters keeping in mind if something happens (like fire, earthquake or flood etc.) to one datacenter, the IT operations will switchover to the other one in a different geographical location.

Which is exactly what HAST + CARP (plus other things as needed to fix routing, load-balance services, etc) provide. You get 2 servers (regardless of whether they are 2 inches or 2 km apart) "sharing" a virtual IP and replicating data between the two, such that if one fails, nobody notices as the other carries on.

HAST + CARP can be considered the FreeBSD way of doing similar things to DRBD + FreeVRRPd (or VServer) on Linux. They provide storage replication, storage failover, and shared virtual IP.

Heartbeat is completely separate, and acts at a much higher layer in the applications/services stack. And can be used on either FreeBSD or Linux.

HAST + CARP works at the storage layer to provide HA storage. Heatbeat works at the application layer to provide HA services. Two very different things.

phoenix · Apr 12, 2012

tuaris said:
Interesting, when I reboot either server regardless of it's current role, it always assumes the MASTER role in CARP.

Check your CARP sysctls to make sure preempt is disabled.

gkontos · Apr 15, 2012

phoenix said:
Which is exactly what HAST + CARP (plus other things as needed to fix routing, load-balance services, etc) provide. You get 2 servers (regardless of whether they are 2 inches or 2 km apart) "sharing" a virtual IP and replicating data between the two, such that if one fails, nobody notices as the other carries on.

I would really appreciate if you could share your experience with HAST replication over long distance servers.
Like a DR scenario with a 10Mbit line. (Any example you have, regardless of the speed, is very welcomed and much anticipated :e)

George

phoenix · Apr 16, 2012

I've only done tests with gigabit links, mostly LAN but one test was across a WAN link, approximately 5 km. But, that was still a gigabit fibre link.

zennybsd · Apr 17, 2012

phoenix said:
I've only done tests with gigabit links, mostly LAN but one test was across a WAN link, approximately 5 km. But, that was still a gigabit fibre link.

Would like to know how you achieved this? Any pointers or tutorial would be appreciated!

phoenix · Apr 17, 2012

We have fibre runs between admin sites and secondary schools within the city. We setup 1 pair of fibres to extend out DMZ between the server rooms in two buildings. And I did some HAST+CARP testing using VMs in the two buildings. The setup was exactly the same as the testing using VMs in the same building, since the two buildings were (essentially) on the same LAN.

zennybsd · Apr 19, 2012

phoenix said:
We have fibre runs between admin sites and secondary schools within the city. We setup 1 pair of fibres to extend out DMZ between the server rooms in two buildings. And I did some HAST+CARP testing using VMs in the two buildings. The setup was exactly the same as the testing using VMs in the same building, since the two buildings were (essentially) on the same LAN.

Meaning HAST is possible only under the same LAN, not between different LANs in different geographical locations.

Your setup is already covered by gkontos' howto the first post of this thread. But ..

It is important to have failover storage or network services in different geographical locations in case of natural disasters. Yes, zfs send and receive can backup data, but not sure of failover part.

Thanks!

gkontos · Apr 19, 2012

zennybsd said:
Meaning HAST is possible only under the same LAN, not between different LANs in different geographical locations.

That is not entirely correct. CAPR advertises a virtual IP. You don't need CARP for HAST to work.

And even if you decide to use CARP you can use complex routing scenarios so that those 3 IP address are routable between different LANs

zennybsd said:
Your setup is already covered by gkontos' howto the first post of this thread. But ..

It is important to have failover storage or network services in different geographical locations in case of natural disasters. Yes, zfs send and receive can backup data, but not sure of failover part.

Thanks!

My point was that as long as HAST uses full synch only at the moment, it is difficult to implement DR scenarios over long distances.

DRBD mirroring implementations for DR are implemented in asynchronous mode.

phoenix · Apr 19, 2012

zennybsd said:
Meaning HAST is possible only under the same LAN, not between different LANs in different geographical locations.

HAST works between any two systems that are accessible via TCP/IP, whether that be on the local network, through a router, over the Internet, wherever. So long as the two systems are accessible via TCP/IP, you can use HAST to replicate the data between them.

How you do the failover will, of course, depend on the setup. To failover on a LAN, you can use CARP to share a single local IP. To failover on a WAN, you'd use something at or above the routing layer. What you use depends on the setup. Maybe you tunnel things. Maybe you update routing tables. Maybe you failover across IPs. Maybe you put a load-balance in front to make it transparent. Maybe you use something else.

All HAST does is replicate data between two systems. You stack other stuff on top to make it fit your needs.

You won't find a single piece of software that provides everything. But you will find software that can be stacked together to provide the features you need.

srdjanrosic · Jun 11, 2012

... and NFS?

Is anyone exporting the ZFS through NFS?

I'm getting

Code:

Stale NFS File Handle

when a failover happens (which means that NFS fsid's are different, I think).

glocke · Aug 21, 2012

I'm also getting stale NFS handles.
As far as I understand this, in 8.3-RELEASE the sysctl vfs.typenumhash was introduced to address this issue: http://www.freebsd.org/releases/8.3R/relnotes-detailed.html#FS.
I tried the setup on two 9.0-RELEASE machines with a third one acting as NFS client, no matter the value of vfs.typenumhash, when doing a failover the clients quits the cp command with

Code:

Stale NFS file handle

Has anybody more insight into this matter? Any help would be greatly appreciated.

Greetings glocke

gkontos · Aug 22, 2012

Can you post your NFS configuration?

Also, do you run the NFS service in the standby node as well or do you start it when it becomes active?

glocke · Aug 24, 2012

Hi gkontos,

I copied the NFS switch-over from http://www.erik.eu/carp-hast-nfs/:

Code:

On both server add to rc.conf:
 nfs_server_enable="YES"
 nfs_server_flags="-u -t -n 4 -h 192.168.0.63"
 rpcbind_enable="YES"
 mountd_flags="-r"
 rpcbind_flags="-h 192.168.0.63"

From the failover-script, it justs starts nfsd in master-mode, but it does not stop it, when in slave mode, so yes most of the time the nfsd runs on master and slave. Spured from your comment, I modified the script to stop the nfsd and rpcbind when in slave mode and also disabled both in rc.conf.
Now I don't get any Stale NFS file handle when copying from NFS to local disk on the client. Note that the NFS export is mounted without any parameters (nfs_client is enabled in rc.conf).
I will have a deeper look into all of that, now it just works, (I guess the -h param for nfsd and rpcbind just hassled with eachother due to the same (shared) IP address. Thanks again for the hint

glocke

gkontos · Aug 24, 2012

glocke said:
Spured from your comment, I modified the script to stop the nfsd and rpcbind when in slave mode and also disabled both in rc.conf.
Now I don't get any Stale NFS file handle when copying from NFS to local disk on the client. Note that the NFS export is mounted without any parameters (nfs_client is enabled in rc.conf).

That is something that I have to probably modify in my guide. There are some services that should not run on the standby node.

dswartz · Sep 27, 2012

Corruption issue with HAST?

I was trying an experiment. Had two VMs under ESXi. Each had two 32GB vmdks. The idea was to set up a HAST for each VM using da2, and then mirror da1 with the HAST device (using ZFS). I got both guys up and running, without all the scripts (e.g. just testing manually.) My switchover technique was:

1. Export pool on primary.
2. Set primary to secondary (hastctl role secondary tank)
3. Set secondary to primary (hastctl role primary tank)
4. Import pool on newly-promoted primary.

This works just fine (e.g. no errors logged), however, if I then check the pool, it complains about a bunch of checksum errors on the local disk of the pool (e.g. the zfs local disk, NOT the local disk HAST is using.) I scrub the pool and do a 'zpool clear tank', and all *seems* well. Until I flip control back to the other host (using steps 1-4 again). Again, corruption on the local disk of the newly-promoted host. Nothing useful is being logged anywhere I can see. Any ideas where to look? Thanks!

vermaden · Sep 27, 2012

I have tested similar setup under VirtualBox, also with ZFS and everything was ok, maybe its ESXi getting involved somewhere between?

dswartz · Sep 27, 2012

Hmmm

Well, both disks (the local one and the other local one used by hast) are just vmdks - and this does not happen if not doing hast

I am trying a different approach. 4 virtual disks, each one part of a hast resource. I will then setup the 4 hast resources in a 2x2 raid10. Maybe it does not like having a local device explicitly mirrored with the hast device? I will post my findings...

FMiralha · Dec 16, 2012

zennybsd said:
.. two datacenters keeping in mind if something happens (like fire, earthquake or flood etc.) to one datacenter, the IT operations will switchover to the other one in a different geographical location...

Hi! Let me ask... Is there a solution using FreeBSD?

Two FreeBSD machines, one on each datacenter, geografically distants?

vermaden · Dec 16, 2012

FMiralha said:
Hi! Let me ask... Is there a solution using FreeBSD?

Two FreeBSD machines, one on each datacenter, geografically distants?

From what I know, HAST does not need single VLAN/layer 2 network to be spread between the two different datacenters. CARP may need that, but there is also UCARP which should allow what, so IMHO it should be possible using one network on the first datacenter and some other network from the second one. These networks of course need to 'see' each other.

gkontos · Dec 17, 2012

vermaden said:
From what I know, HAST does not need single VLAN/layer 2 network to be spread between the two different datacenters. CARP may need that, but there is also UCARP which should allow what, so IMHO it should be possible using one network on the first datacenter and some other network from the second one. These networks of course need to 'see' each other.

Correct, this would not be an issue. However, HAST does not support async mode for synchronization. This means that a descend Internet connection must exist between the 2 datacenters. Of course, this also depends on the amount of data that change on the primary node.

interrupted · Dec 18, 2012

balboah said:
Mar 29 10:01:01 storage1 hastd[6690]: [disk2] (primary) Remote request failed (Operation not supported by device): FLUSH.
Mar 29 10:01:02 storage1 hastd[6690]: [disk2] (primary) Unable to flush disk cache on activemap update: Operation not supported by device.[/CODE]

anyone else had this and found a solution for it?

pol · Feb 16, 2013

other script for devd action

Hello. I wrote another script for service management, disk status, depending on the CARP interface. Maybe it solves the problems encountered in switching node and the order start services at boot time.

Code:

#!/bin/sh

# Copyright (c) 2013 Pavel I Volkov
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# 1. Redistributions of source code must retain the above copyright
#    notice, this list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright
#    notice, this list of conditions and the following disclaimer in the
#    documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#

# file location: /usr/local/libexec/carpcontrol.sh
# version 1.5

# file example: /usr/local/etc/devd/carp.conf
# notify 0 {
# 	match "system"          "IFNET";
# 	match "subsystem"       "carp*";
# 	match "type"            "LINK_UP";
# 	action "/usr/local/libexec/carpcontrol.sh $type $subsystem";
# };
#
# notify 0 {
# 	match "system"          "IFNET";
# 	match "subsystem"       "carp*";
# 	match "type"            "LINK_DOWN";
# 	action "/usr/local/libexec/carpcontrol.sh $type $subsystem";
# };

# file example: /etc/fstab
# ...
# /dev/hast/volume /hast/volume ufs rw,noauto 0 0
# ...

# file example: /etc/rc.conf.local
# ...
# hastd_enable="YES"
# carpcontrol_services="samba apache22"
# if [ -d "/dev/hast" ]; then # master mode
# samba_enable="YES"
# apache22_enable="YES"
# else # backup mode
# samba_enable="NO"
# apache22_enable="NO"
# fi
# ...

MY=`basename $0 .sh`
PID=$$
EVENT=$1 # Event type
IF="$2"	 # The network interface
PIDf="/var/run/${MY}.pid" # PID file for background process

. /etc/rc.subr
load_rc_config ${MY}

carp_type() { ifconfig ${IF} | sed -E '1,3d;s/^.*(INIT|MASTER|BACKUP).*$/\1/'; }

get_fstab() { awk -v p1=$1 '/\/dev\/hast\//{print $p1}' /etc/fstab; }

hast_role() { local _ret
	hastctl role "${1}" all; _ret=$?
	[ $_ret -ne 0 ] \
		&& logger -p daemon.err -t "${MY}[${PID}]" "hastd unable to switch role to ${1} (${_ret})" \
		|| logger -p daemon.notice -t "${MY}[${PID}]" "hastd switched to ${1} (${_ret})"
	return $_ret
}

serv_ctl() { local _i _st
	for _i in ${carpcontrol_services}; do
		logger -p daemon.notice -t "${MY}[${PID}]" "attempt to change the status of a service ${_i} to ${1}"
		service $_i onestatus > /dev/null 2>&1; _st=$?
		case ${1} in
			*start) [ $_st -ne 0 ] && service $_i ${1} ;;
			 *stop) [ $_st -eq 0 ] && service $_i ${1} ;;
		esac
	done
}

hast_init() { local _i
	serv_ctl stop # services stop
	# umount all hast volumes
	sync
	[ -d /dev/hast ] && for _i in `ls -1 /dev/hast/*`; do umount -f "${_i}"; done
	[ -e /var/run/hastctl ] && hast_role init # HAST(init)
}

wait_hast_bg() { local _i _j _k PID
	sleep 0.25; [ -e "${PIDf}" ] && PID=`cat ${PIDf}` || PID=$$
	logger -p daemon.notice -t "${MY}[${PID}]" "bg start from [$$] process"
	until hastctl status all > /dev/null 2>&1 # wait hast daemon
	do
	       sleep 0.25
	       logger -p daemon.notice -t "${MY}[${PID}]" "wait hastd"
	done
	for _i in `jot 12`; do # wait until 3 seconds
		case `carp_type` in
			BACKUP) break ;;
			*) sleep 0.25 ;;
		esac
	done
	for _i in `jot 12`; do # wait until 3 second
		case `carp_type` in
			BACKUP) # backup mode
				hast_init
				# HAST(secondary)
				hast_role secondary
				break
				;;
			MASTER) # master mode
				hast_init
				# HAST(primary)
				hast_role primary
				[ $? -ne 0 ] && break
				# mount all hast volumes from /etc/fstab
				for _j in `get_fstab 2`; do [ -d "$_j" ] || mkdir -p "$_j"; done
				for _j in `get_fstab 1`; do
					for _k in `jot 40`; do sleep 0.25; [ -e $_j ] && break; done # wait until 10 second
					fsck -p -y -t ufs $_j
					logger -p daemon.notice -t "${MY}[${PID}]" "trying mount ${_j}"
					mount $_j
					[ $? -ne 0 ] && break 2
					logger -p daemon.notice -t "${MY}[${PID}]" "${_j} mounted"
				done
				serv_ctl start # services start
				break
				;;
			*) # other mode
				sleep 0.25
				;;
		esac
	done
	rm -f "${PIDf}" # remove PID of the background process
	logger -p daemon.notice -t "${MY}[${PID}]" "bg stop from $$ process"
}

case "${EVENT}" in
	"LINK_UP"|"LINK_DOWN") # Carrier status changed to UP or DOWN
		logger -p daemon.notice -t "${MY}[${PID}]" "${IF} ${EVENT}"
		if [ -e "${PIDf}" ]; then # PID exist
			if [ pgrep -qF "${PIDf}" ]; then # process don't exist
				rm -f "${PIDf}"
			else # process exist
				logger -p daemon.err -t "${MY}[${PID}]" "background process already running"
				exit 1
			fi
		fi
		# because it is necessary to wait hast daemon at startup
		wait_hast_bg > /dev/null 2>&1 &
		echo $! > "${PIDf}" # create PID for background process
		;;
	*)	logger -p daemon.err -t "${MY}[${PID}]" "unknown event ${EVENT} for interface ${IF}" ;;
esac
exit 0

Paul-LKW · Sep 29, 2013

Just tried ZFS with HASTD, it is absolutely not suitable for production as reboot will cause a system halt at;

Code:

Syncing disks, vnodes remaining ... 0 0 0 done
All buffers synced

and then the whole system halts, and you will need to force to push the power button by going to data center by hand (or remote hand) turn off and turn on! ~~!!!~~

HAST and ZFS with CARP failover

Attachments