CARP: devd LINK_UP action not executed on system boot

I have set up CARP and HAST with ZFS according to the following guides:

http://forums.freebsd.org/showthread.php?t=29639
http://www.freebsd.org/doc/handbook/disks-hast.html

Two Servers: A & B.
I start out with server A having the HAST role set to primary.

  • The fail over works as expected when I test it by bringing the CARP interface down and up on the either server.
  • If I reboot server A: server B switches the HAST role to primary and imports the ZFS pools.
  • However, when server A comes back up: server B exports the ZFS pools and switches the HAST role to secondary, but server A never sets its HAST role to primary.

The problem is that on boot, devd doesn't execute the action for the CARP LINK_UP event. I'm not sure if this is by design, if it's a bug, or if something is set up incorrectly (in the guides above). Any way, the preferred mode of operation should be:

When server A comes back: server B remains as primary, and server A becomes a secondary.

I have tried with enabling and disabling preemption (net.inet.carp.preempt). My setup is exactly what is shown in the guides above.
 
Actually, after giving some further thought to this, I would prefer for server A to be the 'default' primary/master. However, like I explained above, when server A comes back up after a reboot, it's never put back into 'primary' mode even though the CARP interface 'takes back control'.
 
I found out what is happening. The devd event is triggered before hastd is started, resulting the the following error message during system startup:

# unable to connect to hastd via /var/run/hastctl: No such file or directory

I have the following in /usr/local/etc/devd/mysql.conf

Code:
# Automatically Start/Stop MySQL and HAST and ZFS
notify 30 {
	match "system" "IFNET";
	match "subsystem" "carp0";
	match "type" "LINK_UP";
	action "/usr/local/automysql/enable";
};
notify 30 {
	match "system" "IFNET";
	match "subsystem" "carp0";
	match "type" "LINK_DOWN";
	action "/usr/local/automysql/disable";
};

Any suggestions on what I should do? Should I instead write an rc.d startup and shutdown script?
 
tuaris said:
I found out what is happening. The devd event is triggered before hastd is started, resulting the the following error message during system startup:

# unable to connect to hastd via /var/run/hastctl: No such file or directory

hastd should always be running on both primary and standby hosts. What commands is your script calling?
 
gkontos said:
hastd should always be running on both primary and standby hosts. What commands is your script calling?

The issue is during bootup:

Turn on server
(booting...)
The system brings up the CARP interface.
The devd event is triggered.
But hastd start up has not happened yet
hastctl errors out
The role remains in "init" mode

Here is the script that is running when the event LINK_UP is triggered:

Code:
#!/bin/sh
delay=3
 
# logging
log="local0.debug"
name="failover"
disk="storage"

#Enables The "storage" device, mounts the "storage", starts MySQL

/usr/bin/logger -p $log -t $name "Switching to primary provider for MySQL."
sleep 1

# Wait for any "hastd secondary" processes to stop
while $( /usr/bin/pgrep -lf "hastd: ${disk} \(secondary\)" > /dev/null 2>&1 ); do
	sleep 1
done

# Switch role
/sbin/hastctl role primary ${disk}

if [ $? -ne 0 ]; then
	/usr/bin/logger -p $log -t $name "Unable to change role to primary for resource storage."
	exit 1
fi

# Wait for the /dev/hast/* devices to appear
for I in $( jot 60 ); do
	[ -c "/dev/hast/${disk}" ] && break
	sleep 0.5
done

if [ ! -c "/dev/hast/${disk}" ]; then
	/usr/bin/logger -p $log -t $name "GEOM provider /dev/hast/${disk} did not appear."
	exit 1
fi

/usr/bin/logger -p $log -t $name "Role for MySQL switched to primary."

/usr/bin/logger -p $log -t $name "Importing Pool"

# Import ZFS pool. Do it forcibly as it remembers hostid of
# the other cluster node.
out=`/sbin/zpool import -f "${disk}" 2>&1`
if [ $? -ne 0 ]; then
	/usr/bin/logger -p local0.error -t hast "ZFS pool import for resource ${disk} failed: ${out}."
	exit 1
fi
/usr/bin/logger -p local0.debug -t hast "ZFS pool for resource ${disk} imported."

#Start MySQL
/usr/bin/logger -p local0.debug -t failover "Starting MySQL Server"
/usr/sbin/service mysql-server onestart
/usr/bin/logger -p local0.debug -t failover "MySQL Server Started"
 
I resolved my start up issues by writing some new devd scripts. Any feedback is welcomed. Keep in mind the scripts below are for a single HAST resource.

/usr/local/etc/devd/mysql.conf
Code:
# Automaticly Start/Stop MySQL and HAST and ZFS
notify 0 {
	match "system" "IFNET";
	match "subsystem" "carp0";
	match "type" "LINK_UP";
	action "/usr/local/automysql/enable mysql storage carp0";
};
notify 0 {
	match "system" "IFNET";
	match "subsystem" "carp0";
	match "type" "LINK_DOWN";
	action "/usr/local/automysql/disable mysql storage carp0";
};

/usr/local/automysql/enable
Code:
#!/bin/sh
#Enables The "storage" device, mounts the "storage", starts MySQL
 
# logging
log="local0.debug"
name=$1
disk=$2
interface=$3

# Check to make sure HAST is running
if 
	! /usr/sbin/service hastd status | grep -q pid
then
	logger -p local0.error -t $name "Hast is not running."
	exit 1
fi

# Check if the ${interface} is set to MASTER
if 
	! ifconfig $interface | egrep -q "MASTER "
then
	logger -p local0.error -t $name "${interface} is not set to MASTER"
	exit 1
fi


# Check if the role isn't already primary
if
	hastctl status ${disk} | grep -q "role: primary"
then
	logger -p $log -t $name "Already primary provider for ${disk}"
	exit 0
fi

logger -p $log -t $name "Switching to primary provider for ${disk}."
# Wait for any "hastd secondary" processes to stop
while $( pgrep -lf "hastd: ${disk} \(secondary\)" > /dev/null 2>&1 ); do
	sleep 1
done

# Switch role
hastctl role primary ${disk}
if [ $? -ne 0 ]; then
	logger -p local0.error -t $name "Unable to change role to primary for resource ${disk}."
	exit 1
fi

# Wait for the /dev/hast/${disk} device to appear
for I in $( jot 240 ); do
	[ -c "/dev/hast/${disk}" ] && break
	sleep 0.25
done
if [ ! -c "/dev/hast/${disk}" ]; then
	logger -p local0.error -t $name "GEOM provider /dev/hast/${disk} did not appear."
	exit 1
fi
logger -p $log -t $name "Role for ${disk} switched to primary."

# Import ZFS pool. Do it forcibly as it remembers hostid of the other cluster node.
out=`zpool import -f "${disk}" 2>&1`
if [ $? -ne 0 ]; then
	logger -p local0.error -t $name "ZFS pool import for resource ${disk} failed: ${out}."
	exit 1
fi
logger -p $log -t $name "ZFS pool for resource ${disk} imported."

#Start MySQL
service ${name}-server onestart
logger -p $log -t $name "MySQL Server Started"

/usr/local/automysql/disable
Code:
#!/bin/sh
#Stops MySQL, unmounts the "storage", Disables The "storage" device

log="local0.debug"
name=$1
disk=$2
interface=$3

# Check to make sure HAST is running
if 
	! /usr/sbin/service hastd status | grep -q pid
then
	logger -p local0.error -t $name "Hast is not running."
	exit 1
fi

# Check if the role is primary, continue only if it is
if
	! hastctl status ${disk} | grep -q "role: primary"
then
	logger -p $log -t $name "Not primary provider for ${disk}"
	exit 0
fi

logger -p $log -t $name "Switching to secondary provider for ${name}."

#Stop MySQL
service ${name}-server onestop
#Make sure MySQL is not running
while 
	/usr/sbin/service ${name}-server onestatus | grep -q pid
do
	logger -p $log -t $name "MySQL Server Stopped"
done

# Export ZFS Pool
zpool list | egrep -q "^${disk} "
if [ $? -eq 0 ]; then
		# Forcibly export file pool.
		out=`zpool export -f "${disk}" 2>&1`
		 if [ $? -ne 0 ]; then
				logger -p local0.error -t $name "Unable to export pool for resource ${disk}: ${out}."
				exit 1
		 fi
		
		#Wait for the ZFS pool to unmount
		while 
			zpool status ${disk} >/dev/null  2>&1
		do
			logger -p $log -t $name "Waiting for ZFS pool (${disk}) to export."
			sleep 1
		done		 
		logger -p $log -t $name "ZFS pool for resource ${disk} exported."
fi

# Switch roles for the HAST resources
hastctl role secondary ${disk} 2>&1
if [ $? -ne 0 ]; then
	logger -p $log -t $name "Unable to switch role to secondary for resource ${disk}."
	exit 1
fi
logger -p $log -t $name "Role switched to secondary for resource ${disk}."
 
follow up question?

Looking at this thread, I guess I am confuse (or thinking I am missing something obvious?) It seems to me that the OP's complaint is a rather obvious one. e.g. how is one supposed to run hastctl via a devd LINK UP action when hastd is not yet running? I am puzzled because google is not turning up much of anything to answer this, and the freebsd wiki articles and such blithely mention hast&carp without even raising this. I would appreciate any clarification...
 
You might also be able to solve the problem by adding "hastd" to the "REQUIRED:" portion of the devd startup script.
 
Back
Top