Problems with HASTD

lessie · Feb 14, 2011

Hi Guys,

I seem to have some trouble with hastd, I hope you can help me in this.
I set up hastd successfully on VMWARE for testing but as I tried to do that on a production server it really doesn't work for me.
Everything seems to work fine, I set up hast.conf, do the role part, newfs and mount. But as soon as I try to copy data to the hast mount, the whole system freezes. No error msg in log files found.

I have the following configuration:
-FreeBSD 8.1 RELEASE p2
-INTEL S5500BCR (Bluff Creek) 8DIMM mboard
-2xIntel Quad-Core Xeon E5506 (4MB Cache, 2.13 GHz, 4.80 GT/s Intel QPI)
-8GB KINGSTON DDR3, 1333MHz ECC CL9 DIMM (Kit of 4) with Thermal Sensor
-Intel SRCSASLS4I, PCI Express x8, 4 port SAS/SATA (RAID level: 0,1,5,6,10,50,60)

I tried it with a similar raid card and different hdds (seagate 160gb sata, seagate 1.5TB sata, seagate sas 75gb/147gb, wd green sata) with no luck. Same thing with the motherboard sata.

On the other hand, I set up the same raid card and hdds on an older system (2xXEON 3306) without any problem.

The strange thing on both systems: when I do a reboot or a shutdown -r/h now, most of the times a got the following error:

Code:

hast/[devicename] went missing, expect data loss

Many times this message comes with a panic and a 15 sec automatic reboot.

I realized that if I umount the hast devices before reboot or shutdown this error disappears. I could live with this but I guess this should not work this way.

If I copy data to non-hast devices, everything works fine.

Anybody can help me?

Thanks,
G.

lessie · Feb 16, 2011

Just a quick update on this issue:

I tried it with an identical server and it does the same. I'm sure this is not a motherboard failure after this but some bug in the hastd probably related to this motherboard type.

While I was testing it, gstat showed hast/[devicename] %busy 99.9 and the system became unreachable withing seconds. The interesting part is that only the hast device runs at this percent and the related hdd (like /dev/ad6) is at 0%.

I can still ping the server but nothing else.

lessie · Mar 20, 2011

Hi Guys,

I just did some further tests on this issue with a totally different machine, it's a DELL PowerEdge T110. Same thing happens.
I started HAST in the foreground and in debug mode and when this issue happens, I see the following repeating on and on:

Code:

[DEBUG][2] [web] (primary) remote_send: Taking request.
[DEBUG][2] [web] (primary) remote_guard: Checking connections.
[DEBUG][2] [web] (primary) remote_send: Taking request.
[DEBUG][2] [web] (primary) remote_guard: Checking connections.
[DEBUG][2] [web] (primary) remote_send: Taking request.
[DEBUG][2] [web] (primary) remote_guard: Checking connections.

Seems like a geom_gate timeout. Yesterday I did some test and for some reason I left the server there in this hanging, a couple of hours later when I checked upon it I saw that it has recovered from the hanging and finished copying files.

Any idea?

hblandford · Mar 21, 2011

Lessie,

have you tried an upgrade to 8.2? There was a lot of work done between 8.1 and 8.2. I had a play with hastd in early 8.1 days and you had to wait a long time for the whole new disk creation to be copied acrosssynced to the other system when first created.

Hugh

DutchDaemon · Mar 21, 2011

lessie, re-read "Posting and Editing in the FreeBSD Forums" on how to use [/b][/FILE] tags: [url]http://forums.freebsd.org/showthread.php?t=8816[/url]. And don't post moderated posts twice. Thanks.

lessie · Mar 21, 2011

Hi Hugh,

Thanks for your reply.

Yes the latest tests were made on 8.2 amd64. I know that you have to wait until the replication is done. The problem is not with the time I have to wait, but the system hanging.

This issue really similar (I guess) to the gmirror+ggate thing when the secondary server goes down and the primary hanging until the secondary comes back online.

My problem is that if I start copying about 300-400GB data from one hdd to the hast device, it starts hanging randomly. Sometimes, when the copying is done, I do a [cmd=]rm -r *[/cmd] on the whole mount as a test and the hanging happens again.

These things do not happen on non-hast mounts on these machines.
I don't know if I had really bad luck with my servers and some part of them are not compatible with something, or I do something wrong, or really there is an issue with hastd.

I usually start with a test as followings:
I create a .sh script with an infinite csh loop with 4 commands: [cmd=]rm -r /home/hast/*; rsync -aP /usr/ports /home/hast/; rsync -aP /usr/src /home/hast/; sleep 120[/cmd] Within 5-10 loops the system starts hanging which is not a good thing if you put this system in production and let's say someone deletes a big directory on the web server.

On the other hand I'm a bit sceptic and there is a chance that I do something wrong since setting up hastd is really not difficult and nobody else got these problems but me.

L.

hblandford · Mar 22, 2011

Do you have carp or any other automatic scripts involved at this stage?

What is the network configuration between your two hast nodes?

Hugh

lessie · Mar 22, 2011

Both systems have the basic FreeBSD install with GENERIC kernel. No carp, no scripts involved. I have a gbit crosslink between the 2 nodes and replication is working fine.

I tested with both 1 node and 2 nodes. With the 1 node I set up hast only on the primary, network is set but cables are unplugged and I start copying data. Randomly hangs, sometimes after 100-200GBytes copying.

With the 2 nodes, first I set up replication and then copying data. My observation is that it really does not count which type of setup I make, both ends with hanging. If copying is done for some reason, rm -r * does the job.

This is the network card on both nodes:

Code:

bce0: <Broadcom NetXtreme II BCM5709 1000Base-T (C0)> mem 0xda000000-0xdbffffff irq 16 at device 0.0 on 
pci3

I tried it with a different type of card:

Code:

em0: <Intel(R) PRO/1000 Network Connection 7.0.5> port 0x2000-0x201f mem 0xb1b00000-
0xb1b1ffff,0xb1a00000-0xb1afffff,0xb1b20000-0xb1b23fff irq 28 at device 0.0 on pci1

When system hanging (only the primary), it looks like the ggate+gmirror thing when you unplug the backup node. In that case gstat shows 99.9 %busy just like in my case where gstat shows the following:

Code:

dT: 1.003s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| acd0
   14      0      0      0    0.0      0      0    0.0    0.0| da0
    0      0      0      0    0.0      0      0    0.0    0.0| da1
   19      0      0      0    0.0      0      0    0.0    0.0| da0s1
    0      0      0      0    0.0      0      0    0.0    0.0| da1s1
    0      0      0      0    0.0      0      0    0.0    0.0| da2
    0      0      0      0    0.0      0      0    0.0    0.0| da3
    9      0      0      0    0.0      0      0    0.0    0.0| da0s1a
    0      0      0      0    0.0      0      0    0.0    0.0| da0s1b
    8      0      0      0    0.0      0      0    0.0    0.0| da0s1d
    0      0      0      0    0.0      0      0    0.0    0.0| da0s1e
    3      0      0      0    0.0      0      0    0.0    0.0| da0s1f
    0      0      0      0    0.0      0      0    0.0    0.0| da2s1
    0      0      0      0    0.0      0      0    0.0    0.0| da3s1
    0      0      0      0    0.0      0      0    0.0    0.0| da1s1d
    0      0      0      0    0.0      0      0    0.0    0.0| da1s1e
 4872     10      0      0    0.0     10      0  77676   99.9| hast/web

Everything drops down to 0 but the hast device. ms/w slowly increasing.

L.

Pfarthing6 · Apr 3, 2011

Yup, I'm having the same/very-similar issue randomlay as well. Running 8.2-RELEASE, real system not vmware or other virtual.

I can reproduce it a few times, then it doesn't reproduce for another few times. It isn't specific to either master or slave either. Very odd.

Basically what appears to be happening is that whenever either the Master or the Slave change hast roles to "init", the worker processes on the init-role system exit and the worker processes on the other (primary or secondary) attempt to restart themselves, and sometimes fail.

For instance. On my Master, I start with this:

Code:

nas1# ps -p `pgrep hastd`
PID  TT  STAT      TIME COMMAND
40343  ??  Ss     0:00.02 /sbin/hastd
40563  ??  I      0:00.01 hastd: ada0 (primary) (hastd)
40564  ??  I      0:00.01 hastd: ada1 (primary) (hastd)
40565  ??  I      0:00.01 hastd: ada2 (primary) (hastd)
40566  ??  I      0:00.01 hastd: ada3 (primary) (hastd)
40567  ??  I      0:00.01 hastd: ada4 (primary) (hastd)
40568  ??  I      0:00.01 hastd: ada5 (primary) (hastd)
40569  ??  I      0:00.01 hastd: ada6 (primary) (hastd)
40570  ??  I      0:00.01 hastd: ada7 (primary) (hastd)
40571  ??  I      0:00.01 hastd: ada8 (primary) (hastd)
40572  ??  I      0:00.01 hastd: ada9 (primary) (hastd)
40573  ??  I      0:00.01 hastd: ada10 (primary) (hastd)
40574  ??  I      0:00.01 hastd: ada11 (primary) (hastd)
40575  ??  I      0:00.01 hastd: ada12 (primary) (hastd)

I then issue [CMD="nas2#"] hastctl role init[/CMD] on the slave. Checking the Master again I see that the worker PID's have changed for most of them:

Code:

nas1# ps -p `pgrep hastd`
PID  TT  STAT      TIME COMMAND
40343  ??  Ss     0:00.04 /sbin/hastd
40572  ??  I      0:00.01 hastd: ada9 (primary) (hastd)
41432  ??  I      0:00.00 hastd: ada3 (primary) (hastd)
41435  ??  I      0:00.00 hastd: ada12 (primary) (hastd)
41444  ??  I      0:00.00 hastd: ada11 (primary) (hastd)
41447  ??  I      0:00.00 hastd: ada10 (primary) (hastd)
41456  ??  I      0:00.00 hastd: ada8 (primary) (hastd)
41459  ??  I      0:00.00 hastd: ada7 (primary) (hastd)
41468  ??  I      0:00.00 hastd: ada6 (primary) (hastd)
41471  ??  I      0:00.00 hastd: ada5 (primary) (hastd)
41480  ??  I      0:00.00 hastd: ada4 (primary) (hastd)
41483  ??  I      0:00.00 hastd: ada2 (primary) (hastd)
41492  ??  S      0:00.00 hastd: ada1 (primary) (hastd)
41495  ??  S      0:00.00 hastd: ada0 (primary) (hastd)

This is fine. As when I start the Slave back up as a hast secondary, everything comes back to life. On the other hand, if I'm not lucky, I get this sometimes.

Code:

nas1# ps -p `pgrep hastd`
 PID  TT  STAT      TIME COMMAND
 6967  ??  Is     0:00.18 /sbin/hastd
 9436  ??  I      0:00.00 hastd: ada12 (primary) (hastd)
 9437  ??  Z      0:00.00 <defunct>
 9447  ??  Z      0:00.00 <defunct>
 9448  ??  Z      0:00.00 <defunct>
 9449  ??  Z      0:00.00 <defunct>
 9450  ??  Z      0:00.00 <defunct>
 9460  ??  Z      0:00.01 <defunct>
 9461  ??  I      0:00.00 hastd: ada11 (primary) (hastd)
 9471  ??  I      0:00.00 hastd: ada10 (primary) (hastd)
 9472  ??  Z      0:00.00 <defunct>
 9483  ??  Z      0:00.00 <defunct>
 9484  ??  Z      0:00.00 <defunct>

(this is from a different session, so PID's are not relevent here)

Now, when this happens and the worker processes go zombie, the only way to fix it, is to do [CMD=""]kill -9 `pgrep hastd`[/CMD] and then restart the hastd service again. Tried waiting to see if they'd clean themselves up, but they just hang around. In this case, hastd is unresponsive to service restarts and hastctl commands just hang.

If this happens on the slave, it's no big to kill them all and start it up again. But when it happens on the master, I loose my storage for the time it takes to restart hastd and it's procs. My zpool will be very unhappy in that case!

All this testing is purely manual, but I noticed most prolifically when using really simply failover scripts with CARP/DevD.

Will try to post more if I figure anything new out.

Update: a couple things I'm noticing which DO NOT seem to ever result in hast worker procs turning to zombies and thus breaking replication: First, if I do a [CMD="host#"]kill -9 `pgrep hastd`[/CMD] on the slave, the hastd worker procs on the master restart themselves successfully each time. If I do it on the master, the hastd worker procs on the slave exit gracefully, no zombies. This effectively simulates a real failure of some kind. Second, when I want to gracefully switch roles, if I change the master hast resources to "secondary" first [CMD="nas1#"]hastctl role secondary all[/CMD], so I have two secondaries, then change the slave hast resources to primary [CMD="nas2#"]hastctl role primary all[/CMD], role transition is smooth, nothing breaks. I can't reproduce the above errors. I'm still doing all this manually, so haven't hammered at it. What I'm thinking then is that there is something up with the "init" role (I know, why was he doing that!), and by not using it, the issues have apparently abated.

Update2: So, that didn't solve all of it. I setup a simple script for CARP/DevD and the Slave crashed hard with a page fault. I've been getting this regularly. Here's what I get on the screen:

Code:

processor eflags = interrupt enabled, resume, IOPL
current = process39498 (hastd)
trap number = 12
panic: page fault
cpuid = 3
KBD: stack backtrace:
#0 0xffffffff805f4e0e at kbd_backtrace+0x5e
#1 0xffffffff805c2d07 at panic+0x187
#2 0xffffffff808ac600 at trap_fatal+-x290
#3 0xffffffff808ac9df at trap_pfault+0x28f
#4 0xffffffff808acebf at trap+0x3df
#5 0xffffffff80894fb4 at calltrap+0x8
#6 0xffffffff8054cebd at devfs_ioctl_f+0x7b
#7 0xffffffff806043c2 at karn_ioctl+0x102
#8 0xffffffff806045fd at ioctl+0xfd
#9 0xffffffff80600dd5 at syscallenter+0x1e5
#10 0xffffffff808aca5b at syscall+0x4b
#11 0xffffffff80895292 at Xfast_syscall+0xe2
uptime: 20hrs22m35s
Cannot dump. Device not defined or unavailable
Automatic reboot in 15 seconds - press a key on the console to abort
panic: bufwrite: buffer is not busy???
cpuid = 3

Unfortunately, the system does not restart. It's completely hosed until a hard reset is performed.

Here's my DevD script:

Code:

nas1# cat /etc/devd/carp.conf
notify 10 {
        match "system"          "IFNET";
        match "subsystem"       "carp0";
        match "type"            "LINK_UP";
        action                  "/usr/local/bin/role-switch.sh master";
};

notify 10 {
        match "system"          "IFNET";
        match "subsystem"       "carp0";
        match "type"            "LINK_DOWN";
        action                  "/usr/local/bin/role-switch.sh slave";
};

Here's my role switching script:

Code:

#!/usr/local/bin/bash

hast_role_change()
{
	# log what we're doing
	logger -p local0.debug -t hast "Attempting role change to $1."

	# allow worker procs on old primary to exit gracefully before changing roles
	if [ $1 = "primary" ]; then
		sleep 30;
	fi
	
	# change role
	hastctl role $1 all
	
	# Check exit status for attempted role change
	if [ $? -ne 0 ]; then
		logger -p local0.debug -t hast "Unable to change HAST role to $1. Aborting cluster role change."
		exit 1;
	else
		logger -p local0.debug -t hast "HAST role change to $1 completed successfully."
	fi
}

# log cluster role change request
logger -p local0.debug -t cluster "Role change request: $1"

case "$1" in
	master)

		# change role from slave to master
		logger -p local0.debug -t cluster "Attempting role change to $1."

		# Change role to primary for all
		hast_role_change primary

	;;

	slave)
		# switch role from master to slave
		
		# change role from slave to master
		logger -p local0.debug -t cluster "Attempting role change to $1."

		# Switch roles for the HAST resources
		hast_role_change secondary		
		
	;;
esac

# log cluster role change success
logger -p local0.debug -t cluster "Role change to $1 completed successfully."

I've seen other scripts where there are checks for worker procs and I intend on adding thouse, but for this testing in my simple setup, a manual transition takes no time at all for worker processes to exit. So, sleeping for 30 seconds (meaning there are two secondaries for a short time) should be adequate before a secondary is promoted to master.

Now, these page fault errors are pretty consistent. I can sometimes get one or two graceful role transitions, but by the 3rd or 4th (and again, I'm waiting for datatransfers to complete), the master or slave will inevitably crash.

One more thing, I'm testing failover just by bringing down the carp iface [CMD="host#"]ifconfig carp0 down[/CMD]

Another Update:
So, it seems that this issue is related to CARP/DEVD. I tested the role-switch.sh script above manually by having a terminal open on both systems, then executing the script simultaneously, supplying "master" for one, "slave" for the other. I could without fail change roles over and over with no errors. Then I tried again using CARP and DevD triggers. The Master node crashed with a page fault on the first role transition.

On other test, I used just a simple logging script, to say "changing role to" but not actually doing anything with the hast roles. For those tests, there were no issues. I can bring a carp interface down, and the peer will come up, and both will log the events.

Sorry if this post is a bit long. Not sure if it's better to post smaller one or just update the ongoing before I get replies.

Another Update: Yeah, even the script randomly causes page faults. I'm at my wits end on this. It seems so simple ...sigh. I really have no clue what's going on.

I would greatly appreciate any suggestions.

Thanks!

Pfarthing6 · Apr 5, 2011

Well, I decided to broaden my searching and just focus on "page fault" and "trap number 12".

I randomly came across a suggestion to include this in the loder.conf:

Code:

hw.mca.enabled="0"

Can anyone explain what this does?

So far I have not experienced the number of page faults as I was. I still got a couple, but those too seem to have abated after adding real checks for secondary worker procs on the host to be promoted from slave->master, and waiting for them to exit.

Still testing. Will post more results as things progress.

Update: Well, even though testing started to be smoother, I was only testing with a few hast resource.

After updating hast.conf to the 13 resources, one for each drive: CRASH! Page Fault, yada.

My thoughts are that either the MVS driver is really buggy, or HAST is, maybe both, who knows.

One way or ther other, at least on my hardware, a FreeBSD HA Cluster is not in the cards.

I'm starting on a Linux/Heartbeat/DRBD box tomorrow ...<sigh>. Maybe by 9-RELEASE, things will get better.

phoenix · Apr 5, 2011

There's a race condition in ggate/hastd if using more than 2 local resources. You need to upgrade to a recent 8-STABLE to get the fix. The symptoms of this issue is: changing roles on more than 1 hast resource simultaneous results in a kernel panic.

There's also a "looks like a deadlock" issue with hast systems with lots of I/O happening, due to the queue lengths used in GEOM. Too many pending I/Os to hast devices causes all GEOM I/O to slow to a crawl (10 IOps). pjd@ committed a fix for this to -CURRENT, with an MFC to 8-STABLE targeted for next week.

With the first patch, my 24-drive hast+zfs box has been stable for a week now, with 3.3 TB of data transferred to it (dedup ratio of 1.97 and compress ration of 1.54). Can create hast resources, delete hast resources, switch roles of all 24 resources simultaneously, without crashing the box.

Pfarthing6 · Apr 5, 2011

w00t! Will do, and will post back later!

phoenix said:
There's a race condition in ggate/hastd if using more than 2 local resources. You need to upgrade to a recent 8-STABLE to get the fix. The symptoms of this issue is: changing roles on more than 1 hast resource simultaneous results in a kernel panic.

There's also a "looks like a deadlock" issue with hast systems with lots of I/O happening, due to the queue lengths used in GEOM. Too many pending I/Os to hast devices causes all GEOM I/O to slow to a crawl (10 IOps). pjd@ committed a fix for this to -CURRENT, with an MFC to 8-STABLE targeted for next week.

With the first patch, my 24-drive hast+zfs box has been stable for a week now, with 3.3 TB of data transferred to it (dedup ratio of 1.97 and compress ration of 1.54). Can create hast resources, delete hast resources, switch roles of all 24 resources simultaneously, without crashing the box.

Pfarthing6 · Apr 7, 2011

Well, I got 8-STABLE running and no more problems with hast, woohoo!

I'm really surprised more people haven't encountered this problem, but I guess folks experimenting with it are probably tracking STABLE anyway.

I have yet to really hammer on it, but not crashing is a good start :e

Anyway, thanks everyone. If it helps anyone else, here's the entirety of my setup and working scripts so far...

carp-init.sh: I use this because I don't have my CARP iface coming up automatically. Right now I don't feel comfortable with a system of unknown state coming up and joining the cluster without some admin intervention.

Code:

#!/usr/local/bin/bash
# this must be run on each node that will participate in the cluster

# configs
vip=10.100.100.1
vip_cidr=25
vhid=1
pass=FREEBSD_is_GREAT

# setup carp - assuming one interface for now
# label it zero by default, let carp grab carpdev based on ip
if [ -n  $vhid -a -n $pass -a -n $vip -a -n $vip_cidr ]; then
   ifconfig carp0 create;
   ifconfig carp0 vhid $vhid pass $pass $vip/$vip_cidr;
else
   echo config: missing parameters;
   exit 1;
fi

/etc/devd/carp.conf: This kicks the role changing script and supplies the role that we want to switch to. LINK_DOWN is registered when the carp0 iface is in BACKUP mode and LINK_UP is registered when carp0 is in MASTER mode. This is true, even though ifconfig shows the state of carp0 as UP on both cluster nodes.

Code:

notify 10 {
   match "system"          "IFNET";
   match "subsystem"       "carp0";
   match "type"            "LINK_UP";
   action                  "/usr/local/bin/role-switch.sh master &";
};

notify 10 {
   match "system"          "IFNET";
   match "subsystem"       "carp0";
   match "type"            "LINK_DOWN";
   action                  "/usr/local/bin/role-switch.sh slave &";
};

/usr/local/bin/role-switch.sh: This script actually switches the roles. On a carp/devd enabled cluster, it is launched more or less at exactly the same time on both members. What's missing from this check is starting and stoping storage services like ISCSI and NFS, as well as checking to ensure that data isn't replicating. In a failure, this kind of check won't be needed, but it might be good for graceful failover. It also might be noted that when the CARP interface comes up for the first time, it generates a LINK_DOWN event. So, the script will be called to switch to Slave first, then switch to Master if there is no other CARP iface with the role of MASTER.

Code:

#!/usr/local/bin/bash

# zpool info
zpool_name=rz1
zpool_import=1

# hast info
hast_res_dir=/dev/hast
hast_res_prefix=ada
hast_res_first=0
hast_res_last=12
hast_res_wait=5
hast_res_chks=5


if [ "$1" = "master" ]; then

    logger -p local0.debug -t cluster "role change to master requested"
    logger -p local0.debug -t cluster "hast: checking for secondary worker processess"
    # wait for secondary worker procs to exit
    for i in $(jot 20); do
        workers=$(ps auxf | grep hast | grep -c secondary)
        if [ "$workers" != "0" ]; then
            logger -p local0.debug -t cluster "hast: secondary worker process found, waiting 5s"
            sleep 5;
        fi
    done;
    # final check
    workers=$(ps auxf | grep hast | grep -c secondary)
    if [ "$workers" != "0" ]; then
        logger -p local0.debug -t cluster "hast: secondary worker process not exiting, aborting role change"
        exit 1;
    fi

    # wait a bit for old primary worker procs to fully exit
    logger -p local0.debug -t cluster "hast: grace period 30s before switching to primary"
    sleep 30

    # note: if secondary wokers have exited, then primary workers should also have existed, but
    # this doesn't seem to be exactly the case, so we wait a bit, and keep our fingers crossed
    # a better check might be to ssh to the other host and ensure the system is not acting as
    # primary before switching this host to primary

    # change role
    logger -p local0.debug -t cluster "hast: changing role to primary"
    hastctl role primary all;
    if [ "$?" != "0" ]; then
        logger -p local0.debug -t cluster "hast: role change to primary failed"
    else
        logger -p local0.debug -t cluster "hast: role change to primary succeeded"
    fi

    # wait a bit for hast resources to come online
    #logger -p local0.debug -t cluster "grace period 30s before importing zpool"
    #sleep 30

    # Zpool Import

    # check that all required hast resources are available bef
    for (( i=$hast_res_first; i <= $hast_res_last; i++)); do 
        # build resource path and name
        rsc="$hast_res_dir/$hast_res_prefix$i" 
    
        # check a few times, waiting for a max of some seconds for resource to come online
        for j in $(jot $hast_res_chks); do
            # does resource exist?
            if [ ! -c "$rsc" ]; then
                # resource not found
                logger -p local0.debug -t cluster "hast: $rsc not yet available, sleeping $hast_res_wait"
                sleep $hast_res_wait
            else
                break;
            fi
        done;

        # check again, if it doesn't exist we have a problem
        if [ ! -c "$rsc" ]; then
            # log resource not found
            logger -p local0.debug -t cluster "hast: $rsc not available, aborting zpool import"
            
            # set flag to abort zpool import
            zpool_import=0
            
            # don't bother checking for others
            break;
        fi
    done;

    # import zpool if all resources online
    if [ "$zpool_import" = "1" ]; then
        logger -p local0.debug -t cluster "importing zpool"
        zpool import -f $zpool_name
        if [ "$?" != "0" ]; then
            logger -p local0.debug -t cluster "import of zpool $zpool_name failed"
        else
            logger -p local0.debug -t cluster "import of zpool $zpool_name succeeded"
        fi
    fi
    
    logger -p local0.debug -t cluster "role change to master complete"

elif [ "$1" = "slave" ]; then

    logger -p local0.debug -t cluster "role change to slave requested"

    # check for zpool
    zpool status $zpool_name
    if [ "$?" = "0" ]; then
    
        # export zpool
        logger -p local0.debug -t cluster "exporting zpool"
        zpool export -f $zpool_name
        if [ "$?" != "0" ]; then
            logger -p local0.debug -t cluster "export of zpool failed"
        else
            logger -p local0.debug -t cluster "export of zpool succeeded"
        fi

        # wait a bit for zpool export to finish
        sleep 10

    else
        logger -p local0.debug -t cluster "zpool not found, aborting export"
    fi

    # change role
    logger -p local0.debug -t cluster "hast: changing role to secondary"
    hastctl role secondary all;
    if [ "$?" != "0" ]; then
        logger -p local0.debug -t cluster "hast: role change to secondary failed"
    else
        logger -p local0.debug -t cluster "hast: role change to secondary succeeded"
    fi
    logger -p local0.debug -t cluster "role change to slave complete"
fi

/etc/hast.conf: Just a snippet here, but there is resource config for each of my 13 drives. Also, I'm using a direct crossover cable connection between the nodes on 1gbe, no switch in between. So, that get's me 99% efficiency on bandwidth. Very fast even for non-bonded eth (though that's next).

Code:

resource ada0 {
   on nas1 {
         local /dev/ada0
         remote tcp4://10.1.100.2
    }
    on nas2 {
         local /dev/ada0
       remote tcp4://10.1.100.1
    }
}
resource ada1 {
   on nas1 {
         local /dev/ada1
         remote tcp4://10.1.100.2
    }
    on nas2 {
         local /dev/ada1
       remote tcp4://10.1.100.1
    }
}
# ...

/etc/sysctl.conf: I found these entries necessary for getting things going on Vmware (as weill as enabling permiscous mode in the Vmware virtual switch settings).

Code:

net.inet.carp.preempt=1
net.inet.carp.allow=1
net.inet.carp.log=1
net.inet.carp.drop_echoed=1

Problems with HASTD

Administrator