2f888
![]() |
|
|
|
|
|||||||
| Howtos & FAQs (Moderated) Would you like to share some of your solutions for certain problems? Tips or tricks? Post here. All new topics are automatically moderated. |
![]() |
|
|
Thread Tools | Display Modes |
|
#1
|
||||
|
||||
|
HAST (Highly Available Storage) is a new concept for FreeBSD and it is under constant development. HAST allows to transparently store data on two physically separated machines connected over the TCP/IP network. HAST operates on block level making it transparent for file systems, providing disk-like devices in /dev/hast directory.
In this article we will create two identical HAST nodes, hast1 and hast2. Both devices will use one NIC connected to a vlan for data synchronization and another NIC will be configured via CARP in order to share the same IP address across the network. The first node will be called storage1.hast.test, the second storage2.hast.test and they will both listen to a common IP address which we will bind to storage.hast.test HAST binds its resource names according to the machine's hostname. Therefore, we will use hast1.freebsd.loc and hast2.freebsd.loc as the machines hostnames so that HAST can operate without complaining. For starters, lets set up two identical nodes. For this example I have installed FreeBSD 9.0-RELEASE on two deferent instances using a Linux KVM. Both nodes have 512MB of RAM, one SATA drive containing the OS and three SATA drives which will be used to create our shared Raidz1 pool. In order for carp to work we don't have to compile a new kernel. We can just load it as a module by adding to /boot/loader.conf Code:
if_carp_load="YES" Code:
zfs_enable="YES" ###Primary Interface## ifconfig_re0="inet 10.10.10.181 netmask 255.255.255.0" ###Secondary Interface for HAST### ifconfig_re1="inet 192.168.100.100 netmask 255.255.255.0" defaultrouter="10.10.10.1" sshd_enable="YES" hostname="hast1.freebsd.loc" ##CARP INTERFACE SETUP## cloned_interfaces="carp0" ifconfig_carp0="inet 10.10.10.180 netmask 255.255.255.0 vhid 1 pass mypassword advskew 0" hastd_enable=YES Code:
zfs_enable="YES" ###Primary Interface## ifconfig_re0="inet 10.10.10.182 netmask 255.255.255.0" ###Secondary Interface for HAST### ifconfig_re1="inet 192.168.100.101 netmask 255.255.255.0" defaultrouter="10.10.10.1" sshd_enable="YES" hostname="hast2.freebsd.loc" ##CARP INTERFACE SETUP## cloned_interfaces="carp0" ifconfig_carp0="inet 10.10.10.180 netmask 255.255.255.0 vhid 1 pass mypassword advskew 0" hastd_enable=YES As a result, re1 is being used for HAST synchronization in a vlan while carp0 which is cloned by re0 used under the same vlan with the rest of our clients. In order for HAST to function correctly we have to resolve the correct IPs on every node. We don't want to rely on DNS for this because DNS can fail. Instead we will use /etc/hosts same on every node. Code:
::1 localhost localhost.freebsd.loc 127.0.0.1 localhost localhost.freebsd.loc 192.168.100.100 hast1.freebsd.loc hast1 192.168.100.101 hast2.freebsd.loc hast2 10.10.10.181 storage1.hast.test storage1 10.10.10.182 storage2.hast.test storage2 10.10.10.180 storage.hast.test storage Code:
resource disk1 {
on hast1 {
local /dev/ad1
remote hast2
}
on hast2 {
local /dev/ad1
remote hast1
}
}
resource disk2 {
on hast1 {
local /dev/ad2
remote hast2
}
on hast2 {
local /dev/ad2
remote hast1
}
}
resource disk3 {
on hast1 {
local /dev/ad3
remote hast2
}
on hast2 {
local /dev/ad3
remote hast1
}
}
Lets start hastd on both nodes first: Code:
hast1#/etc/rc.d/hastd start Code:
hast2#/etc/rc.d/hastd start Code:
hast1#hastctl role init disk1 hast1#hastctl role init disk2 hast1#hastctl role init disk3 hast1#hastctl create disk1 hast1#hastctl create disk2 hast1#hastctl create disk3 hast1#hastctl role primary disk1 hast1#hastctl role primary disk2 hast1#hastctl role primary disk3 Code:
hast2#hastctl role init disk1 hast2#hastctl role init disk2 hast2#hastctl role init disk3 hast2#hastctl create disk1 hast2#hastctl create disk2 hast2#hastctl create disk3 hast2#hastctl role secondary disk1 hast2#hastctl role secondary disk2 hast2#hastctl role secondary disk3 Now check the status on both nodes: Code:
hast1# hastctl status disk1: role: primary provname: disk1 localpath: /dev/ada1 ... remoteaddr: hast2 replication: fullsync status: complete dirty: 0 (0B) ... disk2: role: primary provname: disk2 localpath: /dev/ada2 ... remoteaddr: hast2 replication: fullsync status: complete dirty: 0 (0B) ... disk3: role: primary provname: disk3 localpath: /dev/ada3 ... remoteaddr: hast2 replication: fullsync status: complete dirty: 0 (0B) ... Code:
hast2# hastctl status disk1: role: secondary provname: disk1 localpath: /dev/ada1 ... remoteaddr: hast1 replication: fullsync status: complete dirty: 0 (0B) ... disk2: role: secondary provname: disk2 localpath: /dev/ada2 ... remoteaddr: hast1 replication: fullsync status: complete dirty: 0 (0B) ... disk3: role: secondary provname: disk3 localpath: /dev/ada3 ... remoteaddr: hast1 replication: fullsync status: complete dirty: 0 (0B) ... status: complete. If you get a degraded status you can always repeat the procedure. Now it is time to create our ZFS pool. The primary node should have a /dev/hast directory containing our resources. This directory appears only at the active node. Code:
hast1# zpool create zhast raidz1 /dev/hast/disk1 /dev/hast/disk2 /dev/hast/disk3 hast1# zpool status zhast pool: zhast state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zhast ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 hast/disk1 ONLINE 0 0 0 hast/disk2 ONLINE 0 0 0 hast/disk3 ONLINE 0 0 0 replication: fullsync At this point both of our nodes should be available for failover. We have storage1 running as primary and sharing a pool called zhast. Our storage2 is currently in a standby mode. If we have set DNS properly we can ssh to storage.hast.test or by using its carp IP to 10.10.10.180.
__________________
Powered by BareBSD Last edited by gkontos; February 8th, 2012 at 15:36. Reason: typo thanks Nukama |
|
#2
|
||||
|
||||
|
In order to perform a failover we have to first export our pool from the first node, change the role of each resource to secondary. Then change the role of each resource to primary on the standby node and import the pool. This procedure will be done manually to test if failover really works. But for a real HA solution we will eventually create a script that will take care of this.
First lets export our pool and change our resources role: Code:
hast1# zpool export zhast hast1# hastctl role secondary disk1 hast1# hastctl role secondary disk2 hast1# hastctl role secondary disk3 Code:
hast2# hastctl role primary disk1 hast2# hastctl role primary disk2 hast2# hastctl role primary disk3 hast2# zpool import zhast Code:
hast2# zpool status zhast pool: zhast state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zhast ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 hast/disk1 ONLINE 0 0 0 hast/disk2 ONLINE 0 0 0 hast/disk3 ONLINE 0 0 0 errors: No known data errors Code:
hast2# hastctl status disk1: role: primary provname: disk1 localpath: /dev/ad1 ... remoteaddr: hast1 replication: fullsync status: complete ... disk2: role: primary provname: disk2 localpath: /dev/ad2 ... remoteaddr: hast1 replication: fullsync status: complete ... disk3: role: primary provname: disk3 localpath: /dev/ad3 ... remoteaddr: hast1 replication: fullsync status: complete ... One reason would be if the primary node is not responding to the external network thus not being able to serve its clients. Using a devd event we can catch a carp interface going up or down and a state change. Add the following lines to /etc/devd.conf on both nodes: Code:
notify 30 {
match "system" "IFNET";
match "subsystem" "carp0";
match "type" "LINK_UP";
action "/usr/local/bin/failover master";
};
notify 30 {
match "system" "IFNET";
match "subsystem" "carp0";
match "type" "LINK_DOWN";
action "/usr/local/bin/failover slave";
};
Code:
#!/bin/sh
# Original script by Freddie Cash <fjwcash@gmail.com>
# Modified by Michael W. Lucas <mwlucas@BlackHelicopters.org>
# and Viktor Petersson <vpetersson@wireload.net>
# Modified by George Kontostanos <gkontos.mail@gmail.com>
# The names of the HAST resources, as listed in /etc/hast.conf
resources="disk1 disk2 disk3"
# delay in mounting HAST resource after becoming master
# make your best guess
delay=3
# logging
log="local0.debug"
name="failover"
pool="zhast"
# end of user configurable stuff
case "$1" in
master)
logger -p $log -t $name "Switching to primary provider for ${resources}."
sleep ${delay}
# Wait for any "hastd secondary" processes to stop
for disk in ${resources}; do
while $( pgrep -lf "hastd: ${disk} \(secondary\)" > /dev/null 2>&1 ); do
sleep 1
done
# Switch role for each disk
hastctl role primary ${disk}
if [ $? -ne 0 ]; then
logger -p $log -t $name "Unable to change role to primary for resource ${disk}."
exit 1
fi
done
# Wait for the /dev/hast/* devices to appear
for disk in ${resources}; do
for I in $( jot 60 ); do
[ -c "/dev/hast/${disk}" ] && break
sleep 0.5
done
if [ ! -c "/dev/hast/${disk}" ]; then
logger -p $log -t $name "GEOM provider /dev/hast/${disk} did not appear."
exit 1
fi
done
logger -p $log -t $name "Role for HAST resources ${resources} switched to primary."
logger -p $log -t $name "Importing Pool"
# Import ZFS pool. Do it forcibly as it remembers hostid of
# the other cluster node.
out=`zpool import -f "${pool}" 2>&1`
if [ $? -ne 0 ]; then
logger -p local0.error -t hast "ZFS pool import for resource ${resource} failed: ${out}."
exit 1
fi
logger -p local0.debug -t hast "ZFS pool for resource ${resource} imported."
;;
slave)
logger -p $log -t $name "Switching to secondary provider for ${resources}."
# Switch roles for the HAST resources
zpool list | egrep -q "^${pool} "
if [ $? -eq 0 ]; then
# Forcibly export file pool.
out=`zpool export -f "${pool}" 2>&1`
if [ $? -ne 0 ]; then
logger -p local0.error -t hast "Unable to export pool for resource ${resource}: ${out}."
exit 1
fi
logger -p local0.debug -t hast "ZFS pool for resource ${resource} exported."
fi
for disk in ${resources}; do
sleep $delay
hastctl role secondary ${disk} 2>&1
if [ $? -ne 0 ]; then
logger -p $log -t $name "Unable to switch role to secondary for resource ${disk}."
exit 1
fi
logger -p $log -t $name "Role switched to secondary for resource ${disk}."
done
;;
esac
Code:
hast1# ifconfig er0 down Code:
hast1# tail -f /var/log/debug.log Feb 6 15:01:41 hast1 failover: Switching to secondary provider for disk1 disk2 disk3. Feb 6 15:01:49 hast1 hast: ZFS pool for resource exported. Feb 6 15:01:52 hast1 failover: Role switched to secondary for resource disk1. Feb 6 15:01:55 hast1 failover: Role switched to secondary for resource disk2. Feb 6 15:01:58 hast1 failover: Role switched to secondary for resource disk3. Code:
hast2# tail -f /var/log/debug.log Feb 6 15:02:15 hast2 failover: Switching to primary provider for disk1 disk2 disk3. Feb 6 15:02:19 hast2 failover: Role for HAST resources disk1 disk2 disk3 switched to primary. Feb 6 15:02:19 hast2 failover: Importing Pool Feb 6 15:02:52 hast2 hast: ZFS pool for resource imported. Further considerations: What we did today is a basic setup of two nodes sharing a raidz1 pool with automatic role failover in case of a failure that would result in a loss of a carp interface. Obviously, a similar devd event would be generated in case we loose a HAST replication interface. This is something that needs to be addressed similarly since losing that interface will leave us with no synchronization at all. Going further, we would have to add scripts that will bring up and down services during a failover. Original article: http://www.aisecure.net/2012/02/07/h...carp-failover/ Resources: MICHAEL W LUCAS, The Freebsd Handbook
__________________
Powered by BareBSD Last edited by gkontos; February 8th, 2012 at 14:06. Reason: fromating |
| The Following 10 Users Say Thank You to gkontos For This Useful Post: | ||
|
#3
|
|||
|
|||
|
Did you also test a sudden reboot of the master? If I do this, then I get in all kinds of trouble.
Mainly because the CARP interface starts in master mode after a reboot and hence will execute the master script, even if it is not master. Then the trouble starts and you get a split brain scenario. Regards, Johan Last edited by DutchDaemon; March 2nd, 2012 at 18:59. |
|
#4
|
||||
|
||||
|
Quote:
The connection was established via the CARP IP. During the reboot of the master there was an obvious delay until the pool becomes available to the secondary machine but that was solved by a client reset. After the node came up, CARP did not assign the master role therefore I always had to perform a manual fail back. Which FreeBSD version are you using? Do you by any chance have net.inet.carp.preempt=1 in your /etc/sysctl.conf?
__________________
Powered by BareBSD |
|
#5
|
||||
|
||||
|
Quote:
|
|
#6
|
||||
|
||||
|
A quick update. There is a commit in 9-STABLE which allows the user to set the state of the carp cluster.
Link: http://svnweb.freebsd.org/base?view=...evision=232486
__________________
Powered by BareBSD |
|
#7
|
|||
|
|||
|
Great work!
There is a little typo, but it doesn´t affect the script. Look for ${resource} which should be ${resources}. Last edited by DutchDaemon; March 23rd, 2012 at 00:30. |
|
#8
|
|||
|
|||
|
I've been using a version of this guide to set up my own replication testing in two Xen guests. Having disk1 and disk2 set up in HAST I've created a pool with mirror devices. This works most of the time, and all of the time if everything is shut down cleanly.
But for testing I've also tried resetting the HAST master in the middle of writing to a new file, which can get me into troubles. Once the ZFS metadata got corrupted which meant it rolled back a couple of minutes after forcing import with zfs import -F. Another time it completely locks up on zfs import with state tx->tx. Rendering all zfs tools unusable since they all lock up and wait for this import. The same thing happens on both machines even after reboots etc. So I'm currently wondering if this method really is reliable enough or if I should go the snapshot sync route without HAST. Last edited by DutchDaemon; March 28th, 2012 at 15:17. |
|
#9
|
||||
|
||||
|
If you forcibly export a pool during heavy I/O operations then you will eventually end up with corrupted metadata.
This means that you should never initiate a manual failover during I/O operations. What happens though if the primary node crashes? The secondary node will try to import the pool and most probably it will unless a heave corruption has occurred. In that case you can use different import techniques and heal the pool.
__________________
Powered by BareBSD |
|
#10
|
|||
|
|||
|
In my recent tests I've simply done ifconfig down or hard reset while a client is copying files to it via NFS.
More than once I've gotten metadata corruption and errors which when trying zfs import -F tells me to restore the pool from a backup and refuses to import it. Seems a bit sketchy to me as the whole point of doing this in my case is to have a reliable backup machine in case the primary burns up. Also to stall NFS clients until the secondary comes up, which works as long as ZFS doesn't get corrupted. But I have only tried this in a virtual environment using this setup:
Code:
Mar 29 10:01:01 storage1 hastd[6690]: [disk2] (primary) Remote request failed (Operation not supported by device): FLUSH. Mar 29 10:01:02 storage1 hastd[6690]: [disk2] (primary) Unable to flush disk cache on activemap update: Operation not supported by device. Last edited by DutchDaemon; March 30th, 2012 at 20:51. Reason: Formatting & Style: http://forums.freebsd.org/showthread.php?t=8816 / http://forums.freebsd.org/showthread.php?t=18043 |
|
#11
|
|||
|
|||
|
Hello,
I am interested too in this setup. I have tried it before with linux/pacemaker and I ask these questions: 1) When I import/export a zfs from master to slave also nfs/cifs setup is imported/exported? 2) Latency of slave server kills write performance (at least with linux/drbd). I plan to put on slave server a battery backupped ram hard disk. Can I tell zfs to use it as zil/log and always write on it, then later copy to zfs volume (slave hdds may for example on standby) 3) Is hast stable? Thanks, Mario |
|
#12
|
|||
|
|||
|
Quote:
Last edited by DutchDaemon; April 1st, 2012 at 00:49. |
|
#13
|
||||
|
||||
|
@mgiammarco,
1) During a failover the resources change roles. This means that your storage becomes unavailable in machine#1 and available in machine#2. Please note that the resources can only be available to one machine only, the primary. This means that some services that depend on that data might complain. So, you might need to start /stop those servers as well. 2) I don't understand. 3) This is very difficult to answer. Why? Because until a technology is used enough, then there is not much of user feedback and error reporting.
__________________
Powered by BareBSD |
|
#14
|
|||
|
|||
|
Quote:
Narrowing down my issue: In my virtual Xen environment I get some kind of deadlock with state "tx->tx" and 99.8% idle if I do these steps, also all zfs commands stop working and I'm unable to import the pool again even after a reset of the guest machine: dd if=/dev/urandom of=./foo bs=100M count=10 & zpool export -f storageThis only occurs when using HAST in between, not if I create the pool directly on the virtual drives. However it doesn't seem to occur on my real machine that I'm testing with now. Perhaps it's just a bug from using the virtual environment Last edited by DutchDaemon; April 4th, 2012 at 11:04. Reason: Proper formatting: http://forums.freebsd.org/showthread.php?t=8816 |
|
#15
|
||||
|
||||
|
Quote:
__________________
Powered by BareBSD |
|
#16
|
|||
|
|||
|
Quote:
However when split-brain occurs, hastctl says 1.8TB of "dirty" instead of 1-2G that is actually written in total. Is there a way around this? systat -io says tps: 500+ and about 60MB/s on all three drives. While network activity is going at 500KB/s and the dirty counter in hastctl isn't shrinking that fast either. What's causing all the disk activity? Last edited by DutchDaemon; April 5th, 2012 at 00:05. Reason: Proper formatting: http://forums.freebsd.org/showthread.php?t=8816 |
|
#17
|
|||
|
|||
|
When I re-create the secondary that is.
Last edited by DutchDaemon; April 5th, 2012 at 00:05. |
|
#18
|
|||
|
|||
|
@gkontos: Great stuff you shared.
In Linux, DRBD failover is possible with a single NIC, but
I know that a single NIC is not a failover option, but for the system which lacks expansion slots for NICs, one has to opt for a single NIC. Last edited by phoenix; April 12th, 2012 at 22:02. Reason: Please format your posts! |
|
#19
|
||||
|
||||
|
Quote:
Don't forget that both servers should bind to the same IP address in CARP. This means that you would have to perform some sort of complex ROUTING.
__________________
Powered by BareBSD |
| The Following User Says Thank You to gkontos For This Useful Post: | ||
zennybsd (April 11th, 2012) | ||
|
#20
|
|||
|
|||
|
@gkontos: Thanks!
From what you said about CARP, it seems that HAST+CARP is good for storage scalability rather than redundancy, right? Generally, for enterprise grade operations are done in at least two datacenters keeping in mind if something happens (like fire, earthquake or flood etc.) to one datacenter, the IT operations will switchover to the other one in a different geographical location. Is there any solution of the kind with HAST+CARP or is it only local solution? In GNU/Linux, DRBD with Heartbeat/Corosync is able to do what I stated. Is that a possibility with HAST? Just curious! |
|
#21
|
||||
|
||||
|
Quote:
Quote:
Quote:
Also, CARP is not mandatory for HAST. If HAST could support async replication then that would work as a solution for DR replication.
__________________
Powered by BareBSD |
| The Following User Says Thank You to gkontos For This Useful Post: | ||
zennybsd (April 11th, 2012) | ||
|
#22
|
|||
|
|||
|
DRBD + Heartbeat/Pacemaker or Corosync in GNU/Linux supports synchronous replication too. Maybe such a setup requires a fencing device for more effective implementation. Proxmox is a Debian-based distro which uses such approach (upto 1.9 without any fencing device but only with DRBD+Heartbeat and from 2.0 Proxmox uses DRBD with Corosync. A pretty robust enterprise grade solution. Just for information.
|
| The Following User Says Thank You to zennybsd For This Useful Post: | ||
gkontos (April 12th, 2012) | ||
|
#23
|
||||
|
||||
|
My issues thus far with the ZFS + HAST + CARP + DEVD setup are during system startup and shutdown (related forum post is here: http://forums.freebsd.org/showthread.php?t=29996)
Hast1 and hast2 are up, running, and properly replicating. Hast1's role is primary, hast2's role is secondary. Issue #1 The cause of this issue is fully explained in the related forum post. Basically it has to due with the fact that hastd isn't running yet. I can easily work around this issue by modifying the fail-over script (start hastd if it's not running), but that generates errors/warnings during boot up and is not as elegant as I want it to be. Issue #2 Not sure exactly what causes this issue, but it only happens when the role is primary. Some sources online point to a problem with ZFS and HAST. I have been unable to find a work around/fix for this. Any assistance would be appreciated, and by the way: net.inet.carp.preempt=0 on both hosts. |
|
#24
|
||||
|
||||
|
@tuaris
Issue #1 When my primary server comes back online it does not automatically assume a MASTER role in CARP. I have to manually issue on both nodes: #ifconfig carp0 down && ifconfig carp0 upOnly then do they switch roles. This way I avoid split brain issues. Issue #2 Very strange!
__________________
Powered by BareBSD |
|
#25
|
||||
|
||||
|
Quote:
For example I have HostA and HostB... HostA: Code:
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 1.2.3.4 netmask 0xffffff00
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
carp: MASTER vhid 1 advbase 1 advskew 0
Code:
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 1.2.3.4 netmask 0xffffff00
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
carp: BACKUP vhid 1 advbase 1 advskew 0
HostA: Code:
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 1.2.3.4 netmask 0xffffff00
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
carp: BACKUP vhid 1 advbase 1 advskew 0
Code:
carp0: flags=49<UP,LOOPBACK,RUNNING> metric 0 mtu 1500
inet 1.2.3.4 netmask 0xffffff00
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
carp: MASTER vhid 1 advbase 1 advskew 0
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Failover configuration | Wout | Web & Network Services | 4 | August 7th, 2012 09:20 |
| SAN (FreeBSD + ZFS + HAST + CARP + iSCSI) Question | DungeonMaster3000 | Installing & Upgrading | 13 | February 4th, 2012 01:25 |
| Hast | bluetick | Off-Topic | 2 | October 10th, 2011 00:04 |
| Mailserver with ZFS HAST and CARP | Sylhouette | Web & Network Services | 9 | August 27th, 2011 09:25 |
| [Solved] HAST + ZFS: no action on drive failure | Pfarthing6 | General | 2 | July 6th, 2011 01:15 |