I always missed ‘proper’ cluster software for FreeBSD systems. Recently I got to run several Pacemaker/Corosync based clusters on Linux systems. I thought how to make similar high availability solutions on FreeBSD and I was really shocked when I figured out that both Pacemaker and Corosync tools are available in the FreeBSD Ports and packages as net/pacemaker2 and net/corosync2 respectively.
In this article I will check how well Pacemaker and Corosync cluster works on FreeBSD.
There are many definitions of a cluster. One that I like the most is that a cluster is a system that is still redundant after losing one of its nodes (is still a cluster). This means that 3 nodes is a minimum for a cluster by that definition. The two node clusters are quite problematic because of their biggest exposure to the split brain problem. That is why often in the two node clusters additional devices or systems are added to make sure that this split brain does not happen. For example one can add third node without any resources or services just as a ‘witness’ role. Other way is to add a shared disk resource that will serve the same purpose and often its a raw volume with SCSI-3 Persistent Reservation mechanism used.
Lab Setup
As usual it will be entirely VirtualBox based and it will consist of 3 hosts. To not create 3 same FreeBSD installations I used 12.1-RELEASE virtual machine image available from the FreeBSD Project directly:
There are several formats available – qcow2/raw/vhd/vmdk – but as I will be using VirtualBox I used the VMDK one.
Here is the list of the machines for the GlusterFS cluster:
Each VirtualBox virtual machine for FreeBSD is the default one (as suggested in the VirtualBox wizard) with 512 MB RAM and NAT Network as shown on the image below.
Here is the configuration of the NAT Network on VirtualBox.
Before we will try connect to our FreeBSD machines we need to make the minimal network configuration inside each VM. Each FreeBSD machine will have such minimal /etc/rc.conf file as shown example for node1 host.
root@node1:~ # cat /etc/rc.conf
hostname=node1
ifconfig_em0="inet 10.0.10.111/24 up"
defaultrouter=10.0.10.1
sshd_enable=YES
For the setup purposes we will need to allow root login on these FreeBSD machines with PermitRootLogin yes option in the /etc/ssh/sshd_config file. You will also need to restart the sshd(8) service after the changes.
root@node1:~ # grep PermitRootLogin /etc/ssh/sshd_config
PermitRootLogin yes
root@node1:~ # service sshd restart
By using NAT Network with Port Forwarding the FreeBSD machines will be accessible on the localhost ports. For example the node1 machine will be available on port 2211, the node2 machine will be available on port 2212 and so on. This is shown in the sockstat utility output below.
To connect to such machine from the VirtualBox host system you will need this command:
vboxhost % ssh -l root localhost -p 2211
Packages
As we now have ssh(1) connectivity we need to add needed packages. To make our VMs resolve DNS queries we need to add one last thing. We will also switch to ‘quarterly’ branch of the pkg(8) packages.
root@node1:~ # echo 'nameserver 1.1.1.1' > /etc/resolv.conf
root@node1:~ # sed -i '' s/quarterly/latest/g /etc/pkg/FreeBSD.conf
Remember to repeat these two upper commands on node2 and node3 systems.
Now we will add Pacemaker and Corosync packages.
root@node1:~ # pkg install pacemaker2 corosync2 crmsh
root@node2:~ # pkg install pacemaker2 corosync2 crmsh
root@node3:~ # pkg install pacemaker2 corosync2 crmsh
These are messages both from pacemaker2 and corosync2 that we need to address.
Message from pacemaker2-2.0.4:
--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :
# sysctl kern.ipc.maxsockbuf=18874368
To preserve this setting across reboots, append the following
to /etc/sysctl.conf :
kern.ipc.maxsockbuf=18874368
======================================================================
Message from corosync2-2.4.5_1:
--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :
# sysctl kern.ipc.maxsockbuf=18874368
To preserve this setting across reboots, append the following
to /etc/sysctl.conf :
kern.ipc.maxsockbuf=18874368
We need to change the kern.ipc.maxsockbuf parameter. Lets do it then.
root@node1:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node1:~ # service sysctl restart
root@node2:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node2:~ # service sysctl restart
root@node3:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node3:~ # service sysctl restart
Lets check what binaries come with these packages.
root@node1:~ # pkg info -l pacemaker2 | grep bin
/usr/local/sbin/attrd_updater
/usr/local/sbin/cibadmin
/usr/local/sbin/crm_attribute
/usr/local/sbin/crm_diff
/usr/local/sbin/crm_error
/usr/local/sbin/crm_failcount
/usr/local/sbin/crm_master
/usr/local/sbin/crm_mon
/usr/local/sbin/crm_node
/usr/local/sbin/crm_report
/usr/local/sbin/crm_resource
/usr/local/sbin/crm_rule
/usr/local/sbin/crm_shadow
/usr/local/sbin/crm_simulate
/usr/local/sbin/crm_standby
/usr/local/sbin/crm_ticket
/usr/local/sbin/crm_verify
/usr/local/sbin/crmadmin
/usr/local/sbin/fence_legacy
/usr/local/sbin/iso8601
/usr/local/sbin/pacemaker-remoted
/usr/local/sbin/pacemaker_remoted
/usr/local/sbin/pacemakerd
/usr/local/sbin/stonith_admin
root@node1:~ # pkg info -l corosync2 | grep bin
/usr/local/bin/corosync-blackbox
/usr/local/sbin/corosync
/usr/local/sbin/corosync-cfgtool
/usr/local/sbin/corosync-cmapctl
/usr/local/sbin/corosync-cpgtool
/usr/local/sbin/corosync-keygen
/usr/local/sbin/corosync-notifyd
/usr/local/sbin/corosync-quorumtool
root@node1:~ # pkg info -l crmsh | grep bin
/usr/local/bin/crm
Cluster Initialization
Now we will initialize our FreeBSD cluster.
First we need to make sure that names of the nodes are DNS resolvable.
root@node1:~ # tail -3 /etc/hosts
10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3
root@node2:~ # tail -3 /etc/hosts
10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3
root@node3:~ # tail -3 /etc/hosts
10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3
Now we will generate the Corosync key.
root@node1:~ # corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /usr/local/etc/corosync/authkey.
root@node1:~ # echo $?
0
root@node1:~ # ls -l /usr/local/etc/corosync/authkey
-r-------- 1 root wheel 128 Sep 2 20:37 /usr/local/etc/corosync/authkey
Now the Corosync configuration file. For sure some examples were provided by the package maintainer.
root@node1:~ # pkg info -l corosync2 | grep example
/usr/local/etc/corosync/corosync.conf.example
/usr/local/etc/corosync/corosync.conf.example.udpu
We will take the second one as a base for our config.
root@node1:~ # cp /usr/local/etc/corosync/corosync.conf.example.udpu /usr/local/etc/corosync/corosync.conf
root@node1:~ # vi /usr/local/etc/corosync/corosync.conf
/* LOTS OF EDITS HERE */
root@node1:~ # cat /usr/local/etc/corosync/corosync.conf
totem {
version: 2
crypto_cipher: aes256
crypto_hash: sha256
transport: udpu
interface {
ringnumber: 0
bindnetaddr: 10.0.10.0
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_logfile: yes
to_syslog: no
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: 10.0.10.111
nodeid: 1
}
node {
ring0_addr: 10.0.10.112
nodeid: 2
}
node {
ring0_addr: 10.0.10.113
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
}
Now we need to propagate both Corosync key and config across the nodes in the cluster.
We can use some simple tools created exactly for that like net/csync2 cluster synchronization tool for example but plain old net/rsync will serve as well
root@node1:~ # pkg install -y rsync
root@node1:~ # rsync -av /usr/local/etc/corosync/ node2:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2' (ECDSA) to the list of known hosts.
Password for root@node2:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/
sent 1,100 bytes received 69 bytes 259.78 bytes/sec
total size is 4,398 speedup is 3.76
root@node1:~ # rsync -av /usr/local/etc/corosync/ node3:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3' (ECDSA) to the list of known hosts.
Password for root@node3:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/
sent 1,100 bytes received 69 bytes 259.78 bytes/sec
total size is 4,398 speedup is 3.76
Now lets check that they are the same.
root@node1:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf
root@node2:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf
root@node3:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf
Same.
We can now add corosync_enable=YES and pacemaker_enable=YES to the /etc/rc.conf file.
root@node1:~ # sysrc corosync_enable=YES
corosync_enable: -> YES
root@node1:~ # sysrc pacemaker_enable=YES
pacemaker_enable: -> YES
root@node2:~ # sysrc corosync_enable=YES
corosync_enable: -> YES
root@node2:~ # sysrc pacemaker_enable=YES
pacemaker_enable: -> YES
root@node3:~ # sysrc corosync_enable=YES
corosync_enable: -> YES
root@node3:~ # sysrc pacemaker_enable=YES
pacemaker_enable: -> YES
Lets start these services then.
root@node1:~ # service corosync start
Starting corosync.
Sep 02 20:55:35 notice [MAIN ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 info [MAIN ] Corosync built-in features:
Sep 02 20:55:35 warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 warning [MAIN ] Please migrate config file to nodelist.
root@node1:~ # ps aux | grep corosync
root 1695 0.0 7.9 38340 38516 - S 20:55 0:00.40 /usr/local/sbin/corosync
root 1699 0.0 0.1 524 336 0 R+ 20:57 0:00.00 grep corosync
Do the same on the node2 and node3 systems.
The Pacemaker is not yet running so that will fail.
root@node1:~ # crm status
Could not connect to the CIB: Socket is not connected
crm_mon: Error: cluster is not available on this node
ERROR: status: crm_mon (rc=102):
We will start it now.
root@node1:~ # service pacemaker start
Starting pacemaker.
root@node2:~ # service pacemaker start
Starting pacemaker.
root@node3:~ # service pacemaker start
Starting pacemaker.
You need to give it little time to start because if you will execute crm status command right away you will get 0 nodes configured message as shown below.
root@node1:~ # crm status
Cluster Summary:
* Stack: unknown
* Current DC: NONE
* Last updated: Wed Sep 2 20:58:51 2020
* Last change:
* 0 nodes configured
* 0 resource instances configured
Full List of Resources:
* No resources
… but after a while everything is detected and works as desired.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 21:02:49 2020
* Last change: Wed Sep 2 20:59:00 2020 by hacluster via crmd on node2
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* No resources
The Pacemaker runs properly.
root@node1:~ # ps aux | grep pacemaker
root 1716 0.0 0.5 10844 2396 - Is 20:58 0:00.00 daemon: /usr/local/sbin/pacemakerd[1717] (daemon)
root 1717 0.0 5.2 49264 25284 - S 20:58 0:00.27 /usr/local/sbin/pacemakerd
hacluster 1718 0.0 6.1 48736 29708 - Ss 20:58 0:00.75 /usr/local/libexec/pacemaker/pacemaker-based
root 1719 0.0 4.5 40628 21984 - Ss 20:58 0:00.28 /usr/local/libexec/pacemaker/pacemaker-fenced
root 1720 0.0 2.8 25204 13688 - Ss 20:58 0:00.20 /usr/local/libexec/pacemaker/pacemaker-execd
hacluster 1721 0.0 3.9 38148 19100 - Ss 20:58 0:00.25 /usr/local/libexec/pacemaker/pacemaker-attrd
hacluster 1722 0.0 2.9 25460 13864 - Ss 20:58 0:00.17 /usr/local/libexec/pacemaker/pacemaker-schedulerd
hacluster 1723 0.0 5.4 49304 26300 - Ss 20:58 0:00.41 /usr/local/libexec/pacemaker/pacemaker-controld
root 1889 0.0 0.6 11348 2728 0 S+ 21:56 0:00.00 grep pacemaker
We can check how Corosync sees its members.
root@node1:~ # corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.10.111)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.10.112)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(10.0.10.113)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined
… or the quorum information.
root@node1:~ # corosync-quorumtool
Quorum information
------------------
Date: Wed Sep 2 21:00:38 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 1
Ring ID: 1/12
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
1 1 10.0.10.111 (local)
2 1 10.0.10.112
3 1 10.0.10.113
The Corosync log file is filled with the following information.
root@node1:~ # cat /var/log/cluster/corosync.log
Sep 02 20:55:35 [1694] node1 corosync notice [MAIN ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 [1694] node1 corosync info [MAIN ] Corosync built-in features:
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN ] Please migrate config file to nodelist.
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] Initializing transport (UDP/IP Unicast).
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha256
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] The network interface [10.0.10.111] is now up.
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync configuration map access [0]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: cmap
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync configuration service [1]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: cfg
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: cpg
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 02 20:55:35 [1694] node1 corosync notice [QUORUM] Using quorum provider corosync_votequorum
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: votequorum
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: quorum
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] adding new UDPU member {10.0.10.111}
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] adding new UDPU member {10.0.10.112}
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] adding new UDPU member {10.0.10.113}
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] A new membership (10.0.10.111:4) was formed. Members joined: 1
Sep 02 20:55:35 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:55:35 [1694] node1 corosync notice [QUORUM] Members[1]: 1
Sep 02 20:55:35 [1694] node1 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
Sep 02 20:58:14 [1694] node1 corosync notice [TOTEM ] A new membership (10.0.10.111:8) was formed. Members joined: 2
Sep 02 20:58:14 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync notice [QUORUM] This node is within the primary component and will provide service.
Sep 02 20:58:14 [1694] node1 corosync notice [QUORUM] Members[2]: 1 2
Sep 02 20:58:14 [1694] node1 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
Sep 02 20:58:19 [1694] node1 corosync notice [TOTEM ] A new membership (10.0.10.111:12) was formed. Members joined: 3
Sep 02 20:58:19 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync notice [QUORUM] Members[3]: 1 2 3
Sep 02 20:58:19 [1694] node1 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
Here is the configuration.
root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.0.4-2deceaa3ae \
cluster-infrastructure=corosync
As we will not be configuring the STONITH mechanism we will disable it.
root@node1:~ # crm configure property stonith-enabled=false
New configuraion with STONITH disabled.
root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.0.4-2deceaa3ae \
cluster-infrastructure=corosync \
stonith-enabled=false
The STONITH configuration is out of scope of this article but properly configured STONITH looks like that.
First Service
We will now configure our first highly available service – a classic – a floating IP address
root@node1:~ # crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=10.0.10.200 cidr_netmask="24" op monitor interval="30s"
Lets check how it behaves.
root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
primitive IP IPaddr2 \
params ip=10.0.10.200 cidr_netmask=24 \
op monitor interval=30s
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.0.4-2deceaa3ae \
cluster-infrastructure=corosync \
stonith-enabled=false
Looks good – lets check the cluster status.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:03:35 2020
* Last change: Wed Sep 2 22:02:53 2020 by root via cibadmin on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf::heartbeat:IPaddr2): Stopped
Failed Resource Actions:
* IP_monitor_0 on node3 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=132ms
* IP_monitor_0 on node2 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:54Z', queued=0ms, exec=120ms
* IP_monitor_0 on node1 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=110ms
Crap. Linuxism. The ip(8) command is expected to be present in the system. This is FreeBSD and as any UNIX system it comes with ifconfig(8) command instead.
We will have to figure something else. For now we will delete our useless IP service.
root@node1:~ # crm configure delete IP
Status after deletion.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:04:34 2020
* Last change: Wed Sep 2 22:04:31 2020 by root via cibadmin on node1
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* No resources
Custom Resource
Lets check what resources are available by stock Pacemaker installation.
root@node1:~ # ls -l /usr/local/lib/ocf/resource.d/pacemaker
total 144
-r-xr-xr-x 1 root wheel 7484 Aug 29 01:22 ClusterMon
-r-xr-xr-x 1 root wheel 9432 Aug 29 01:22 Dummy
-r-xr-xr-x 1 root wheel 5256 Aug 29 01:22 HealthCPU
-r-xr-xr-x 1 root wheel 5342 Aug 29 01:22 HealthIOWait
-r-xr-xr-x 1 root wheel 9450 Aug 29 01:22 HealthSMART
-r-xr-xr-x 1 root wheel 6186 Aug 29 01:22 Stateful
-r-xr-xr-x 1 root wheel 11370 Aug 29 01:22 SysInfo
-r-xr-xr-x 1 root wheel 5856 Aug 29 01:22 SystemHealth
-r-xr-xr-x 1 root wheel 7382 Aug 29 01:22 attribute
-r-xr-xr-x 1 root wheel 7854 Aug 29 01:22 controld
-r-xr-xr-x 1 root wheel 16134 Aug 29 01:22 ifspeed
-r-xr-xr-x 1 root wheel 11040 Aug 29 01:22 o2cb
-r-xr-xr-x 1 root wheel 11696 Aug 29 01:22 ping
-r-xr-xr-x 1 root wheel 6356 Aug 29 01:22 pingd
-r-xr-xr-x 1 root wheel 3702 Aug 29 01:22 remote
Not many … we will try to modify the Dummy service into an IP changer on FreeBSD.
root@node1:~ # cp /usr/local/lib/ocf/resource.d/pacemaker/Dummy /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
/* LOTS OF TYPING */
Because of the WordPress blogging system limitations I am forced to post this ifconfig resource as an image … but fear not – the text version is also available here – ifconfig.odt – for download.
Also the first version did not went that well …
root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* rc=3: Your agent has too restrictive permissions: should be 755
-:1: parser error : Start tag expected, '<' not found
usage: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig {start|stop|monitor}
^
* rc=1: Your agent produces meta-data which does not conform to ra-api-1.dtd
* rc=3: Your agent does not support the meta-data action
* rc=3: Your agent does not support the validate-all action
* rc=0: Monitoring a stopped resource should return 7
* rc=0: The initial probe for a stopped resource should return 7 or 5 even if all binaries are missing
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* Your agent does not support the reload action (optional)
Tests failed: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig failed 9 tests
But after adding 755 mode to it and making several (hundred) changes it become usable.
root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
/* LOTS OF NERVOUS TYPING */
root@node1:~ # chmod 755 /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
/usr/local/lib/ocf/resource.d/pacemaker/ifconfig passed all tests
Looks usable.
The ifconfig resource. Its pretty limited and with hardcoded IP address as for now.
Lets try to add new IP resource to our FreeBSD cluster.
Tests
root@node1:~ # crm configure primitive IP ocf
acemaker:ifconfig op monitor interval="30"
Added.
Lets see what status command now shows.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:44:52 2020
* Last change: Wed Sep 2 22:44:44 2020 by root via cibadmin on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:
acemaker:ifconfig): Started node1
Failed Resource Actions:
* IP_monitor_0 on node3 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:52Z', queued=0ms, exec=5ms
* IP_monitor_0 on node2 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:53Z', queued=0ms, exec=2ms
Crap. I forgot to copy this new ifconfig resource to the other nodes. Lets fix that now.
root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node2:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node2:
sending incremental file list
./
ifconfig
sent 3,798 bytes received 38 bytes 1,534.40 bytes/sec
total size is 128,003 speedup is 33.37
root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node3:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node3:
sending incremental file list
./
ifconfig
sent 3,798 bytes received 38 bytes 1,534.40 bytes/sec
total size is 128,003 speedup is 33.37
Lets stop, delete and re-add our precious resource now.
root@node1:~ # crm resource stop IP
root@node1:~ # crm configure delete IP
root@node1:~ # crm configure primitive IP ocf
acemaker:ifconfig op monitor interval="30"
Fingers crossed.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:45:46 2020
* Last change: Wed Sep 2 22:45:43 2020 by root via cibadmin on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:
acemaker:ifconfig): Started node1
Looks like running properly.
Lets verify that its really up where it should be.
root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:2a:78:60
inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
root@node2:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:80:50:05
inet 10.0.10.112 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:74:5e:b9
inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
Seems to be working.
Now lets try to move it to the other node in the cluster.
root@node1:~ # crm resource move IP node3
INFO: Move constraint created for IP to node3
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:47:31 2020
* Last change: Wed Sep 2 22:47:28 2020 by root via crm_resource on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:
acemaker:ifconfig): Started node3
Switched properly to node3 system.
root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:74:5e:b9
inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:2a:78:60
inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
Now we will poweroff the node3 system to check it that IP is really highly available.
root@node2:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:49:57 2020
* Last change: Wed Sep 2 22:47:29 2020 by root via crm_resource on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:
acemaker:ifconfig): Started node3
root@node3:~ # poweroff
root@node2:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:50:16 2020
* Last change: Wed Sep 2 22:47:29 2020 by root via crm_resource on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 ]
* OFFLINE: [ node3 ]
Full List of Resources:
* IP (ocf:
acemaker:ifconfig): Started node1
Seems that failover went well.
The crm command also colors various sections of its output.
Good to know that Pacemaker and Corosync cluster runs well on FreeBSD.
Some work is needed to write the needed resource files but one with some time and determination can surely put FreeBSD into a very capable highly available cluster.
EOF
Continue reading...
In this article I will check how well Pacemaker and Corosync cluster works on FreeBSD.

There are many definitions of a cluster. One that I like the most is that a cluster is a system that is still redundant after losing one of its nodes (is still a cluster). This means that 3 nodes is a minimum for a cluster by that definition. The two node clusters are quite problematic because of their biggest exposure to the split brain problem. That is why often in the two node clusters additional devices or systems are added to make sure that this split brain does not happen. For example one can add third node without any resources or services just as a ‘witness’ role. Other way is to add a shared disk resource that will serve the same purpose and often its a raw volume with SCSI-3 Persistent Reservation mechanism used.
Lab Setup
As usual it will be entirely VirtualBox based and it will consist of 3 hosts. To not create 3 same FreeBSD installations I used 12.1-RELEASE virtual machine image available from the FreeBSD Project directly:
There are several formats available – qcow2/raw/vhd/vmdk – but as I will be using VirtualBox I used the VMDK one.
Here is the list of the machines for the GlusterFS cluster:
- 10.0.10.111 node1
- 10.0.10.112 node2
- 10.0.10.113 node3
Each VirtualBox virtual machine for FreeBSD is the default one (as suggested in the VirtualBox wizard) with 512 MB RAM and NAT Network as shown on the image below.

Here is the configuration of the NAT Network on VirtualBox.


Before we will try connect to our FreeBSD machines we need to make the minimal network configuration inside each VM. Each FreeBSD machine will have such minimal /etc/rc.conf file as shown example for node1 host.
root@node1:~ # cat /etc/rc.conf
hostname=node1
ifconfig_em0="inet 10.0.10.111/24 up"
defaultrouter=10.0.10.1
sshd_enable=YES
For the setup purposes we will need to allow root login on these FreeBSD machines with PermitRootLogin yes option in the /etc/ssh/sshd_config file. You will also need to restart the sshd(8) service after the changes.
root@node1:~ # grep PermitRootLogin /etc/ssh/sshd_config
PermitRootLogin yes
root@node1:~ # service sshd restart
By using NAT Network with Port Forwarding the FreeBSD machines will be accessible on the localhost ports. For example the node1 machine will be available on port 2211, the node2 machine will be available on port 2212 and so on. This is shown in the sockstat utility output below.


To connect to such machine from the VirtualBox host system you will need this command:
vboxhost % ssh -l root localhost -p 2211
Packages
As we now have ssh(1) connectivity we need to add needed packages. To make our VMs resolve DNS queries we need to add one last thing. We will also switch to ‘quarterly’ branch of the pkg(8) packages.
root@node1:~ # echo 'nameserver 1.1.1.1' > /etc/resolv.conf
root@node1:~ # sed -i '' s/quarterly/latest/g /etc/pkg/FreeBSD.conf
Remember to repeat these two upper commands on node2 and node3 systems.
Now we will add Pacemaker and Corosync packages.
root@node1:~ # pkg install pacemaker2 corosync2 crmsh
root@node2:~ # pkg install pacemaker2 corosync2 crmsh
root@node3:~ # pkg install pacemaker2 corosync2 crmsh
These are messages both from pacemaker2 and corosync2 that we need to address.
Message from pacemaker2-2.0.4:
--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :
# sysctl kern.ipc.maxsockbuf=18874368
To preserve this setting across reboots, append the following
to /etc/sysctl.conf :
kern.ipc.maxsockbuf=18874368
======================================================================
Message from corosync2-2.4.5_1:
--
For correct operation, maximum socket buffer size must be tuned
by performing the following command as root :
# sysctl kern.ipc.maxsockbuf=18874368
To preserve this setting across reboots, append the following
to /etc/sysctl.conf :
kern.ipc.maxsockbuf=18874368
We need to change the kern.ipc.maxsockbuf parameter. Lets do it then.
root@node1:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node1:~ # service sysctl restart
root@node2:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node2:~ # service sysctl restart
root@node3:~ # echo 'kern.ipc.maxsockbuf=18874368' >> /etc/sysctl.conf
root@node3:~ # service sysctl restart
Lets check what binaries come with these packages.
root@node1:~ # pkg info -l pacemaker2 | grep bin
/usr/local/sbin/attrd_updater
/usr/local/sbin/cibadmin
/usr/local/sbin/crm_attribute
/usr/local/sbin/crm_diff
/usr/local/sbin/crm_error
/usr/local/sbin/crm_failcount
/usr/local/sbin/crm_master
/usr/local/sbin/crm_mon
/usr/local/sbin/crm_node
/usr/local/sbin/crm_report
/usr/local/sbin/crm_resource
/usr/local/sbin/crm_rule
/usr/local/sbin/crm_shadow
/usr/local/sbin/crm_simulate
/usr/local/sbin/crm_standby
/usr/local/sbin/crm_ticket
/usr/local/sbin/crm_verify
/usr/local/sbin/crmadmin
/usr/local/sbin/fence_legacy
/usr/local/sbin/iso8601
/usr/local/sbin/pacemaker-remoted
/usr/local/sbin/pacemaker_remoted
/usr/local/sbin/pacemakerd
/usr/local/sbin/stonith_admin
root@node1:~ # pkg info -l corosync2 | grep bin
/usr/local/bin/corosync-blackbox
/usr/local/sbin/corosync
/usr/local/sbin/corosync-cfgtool
/usr/local/sbin/corosync-cmapctl
/usr/local/sbin/corosync-cpgtool
/usr/local/sbin/corosync-keygen
/usr/local/sbin/corosync-notifyd
/usr/local/sbin/corosync-quorumtool
root@node1:~ # pkg info -l crmsh | grep bin
/usr/local/bin/crm
Cluster Initialization
Now we will initialize our FreeBSD cluster.
First we need to make sure that names of the nodes are DNS resolvable.
root@node1:~ # tail -3 /etc/hosts
10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3
root@node2:~ # tail -3 /etc/hosts
10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3
root@node3:~ # tail -3 /etc/hosts
10.0.10.111 node1
10.0.10.112 node2
10.0.10.113 node3
Now we will generate the Corosync key.
root@node1:~ # corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /usr/local/etc/corosync/authkey.
root@node1:~ # echo $?
0
root@node1:~ # ls -l /usr/local/etc/corosync/authkey
-r-------- 1 root wheel 128 Sep 2 20:37 /usr/local/etc/corosync/authkey
Now the Corosync configuration file. For sure some examples were provided by the package maintainer.
root@node1:~ # pkg info -l corosync2 | grep example
/usr/local/etc/corosync/corosync.conf.example
/usr/local/etc/corosync/corosync.conf.example.udpu
We will take the second one as a base for our config.
root@node1:~ # cp /usr/local/etc/corosync/corosync.conf.example.udpu /usr/local/etc/corosync/corosync.conf
root@node1:~ # vi /usr/local/etc/corosync/corosync.conf
/* LOTS OF EDITS HERE */
root@node1:~ # cat /usr/local/etc/corosync/corosync.conf
totem {
version: 2
crypto_cipher: aes256
crypto_hash: sha256
transport: udpu
interface {
ringnumber: 0
bindnetaddr: 10.0.10.0
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_logfile: yes
to_syslog: no
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: 10.0.10.111
nodeid: 1
}
node {
ring0_addr: 10.0.10.112
nodeid: 2
}
node {
ring0_addr: 10.0.10.113
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
expected_votes: 2
}
Now we need to propagate both Corosync key and config across the nodes in the cluster.
We can use some simple tools created exactly for that like net/csync2 cluster synchronization tool for example but plain old net/rsync will serve as well
root@node1:~ # pkg install -y rsync
root@node1:~ # rsync -av /usr/local/etc/corosync/ node2:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node2' (ECDSA) to the list of known hosts.
Password for root@node2:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/
sent 1,100 bytes received 69 bytes 259.78 bytes/sec
total size is 4,398 speedup is 3.76
root@node1:~ # rsync -av /usr/local/etc/corosync/ node3:/usr/local/etc/corosync/
The authenticity of host 'node2 (10.0.10.112)' can't be established.
ECDSA key fingerprint is SHA256:/ZDmln7GKi6n0kbad73TIrajPjGfQqJJX+ReSf3NMvc.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3' (ECDSA) to the list of known hosts.
Password for root@node3:
sending incremental file list
./
authkey
corosync.conf
service.d/
uidgid.d/
sent 1,100 bytes received 69 bytes 259.78 bytes/sec
total size is 4,398 speedup is 3.76
Now lets check that they are the same.
root@node1:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf
root@node2:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf
root@node3:~ # cksum /usr/local/etc/corosync/{authkey,corosync.conf}
2277171666 128 /usr/local/etc/corosync/authkey
1728717329 622 /usr/local/etc/corosync/corosync.conf
Same.
We can now add corosync_enable=YES and pacemaker_enable=YES to the /etc/rc.conf file.
root@node1:~ # sysrc corosync_enable=YES
corosync_enable: -> YES
root@node1:~ # sysrc pacemaker_enable=YES
pacemaker_enable: -> YES
root@node2:~ # sysrc corosync_enable=YES
corosync_enable: -> YES
root@node2:~ # sysrc pacemaker_enable=YES
pacemaker_enable: -> YES
root@node3:~ # sysrc corosync_enable=YES
corosync_enable: -> YES
root@node3:~ # sysrc pacemaker_enable=YES
pacemaker_enable: -> YES
Lets start these services then.
root@node1:~ # service corosync start
Starting corosync.
Sep 02 20:55:35 notice [MAIN ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 info [MAIN ] Corosync built-in features:
Sep 02 20:55:35 warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 warning [MAIN ] Please migrate config file to nodelist.
root@node1:~ # ps aux | grep corosync
root 1695 0.0 7.9 38340 38516 - S 20:55 0:00.40 /usr/local/sbin/corosync
root 1699 0.0 0.1 524 336 0 R+ 20:57 0:00.00 grep corosync
Do the same on the node2 and node3 systems.
The Pacemaker is not yet running so that will fail.
root@node1:~ # crm status
Could not connect to the CIB: Socket is not connected
crm_mon: Error: cluster is not available on this node
ERROR: status: crm_mon (rc=102):
We will start it now.
root@node1:~ # service pacemaker start
Starting pacemaker.
root@node2:~ # service pacemaker start
Starting pacemaker.
root@node3:~ # service pacemaker start
Starting pacemaker.
You need to give it little time to start because if you will execute crm status command right away you will get 0 nodes configured message as shown below.
root@node1:~ # crm status
Cluster Summary:
* Stack: unknown
* Current DC: NONE
* Last updated: Wed Sep 2 20:58:51 2020
* Last change:
* 0 nodes configured
* 0 resource instances configured
Full List of Resources:
* No resources
… but after a while everything is detected and works as desired.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 21:02:49 2020
* Last change: Wed Sep 2 20:59:00 2020 by hacluster via crmd on node2
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* No resources
The Pacemaker runs properly.
root@node1:~ # ps aux | grep pacemaker
root 1716 0.0 0.5 10844 2396 - Is 20:58 0:00.00 daemon: /usr/local/sbin/pacemakerd[1717] (daemon)
root 1717 0.0 5.2 49264 25284 - S 20:58 0:00.27 /usr/local/sbin/pacemakerd
hacluster 1718 0.0 6.1 48736 29708 - Ss 20:58 0:00.75 /usr/local/libexec/pacemaker/pacemaker-based
root 1719 0.0 4.5 40628 21984 - Ss 20:58 0:00.28 /usr/local/libexec/pacemaker/pacemaker-fenced
root 1720 0.0 2.8 25204 13688 - Ss 20:58 0:00.20 /usr/local/libexec/pacemaker/pacemaker-execd
hacluster 1721 0.0 3.9 38148 19100 - Ss 20:58 0:00.25 /usr/local/libexec/pacemaker/pacemaker-attrd
hacluster 1722 0.0 2.9 25460 13864 - Ss 20:58 0:00.17 /usr/local/libexec/pacemaker/pacemaker-schedulerd
hacluster 1723 0.0 5.4 49304 26300 - Ss 20:58 0:00.41 /usr/local/libexec/pacemaker/pacemaker-controld
root 1889 0.0 0.6 11348 2728 0 S+ 21:56 0:00.00 grep pacemaker
We can check how Corosync sees its members.
root@node1:~ # corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.0.10.111)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.0.10.112)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(10.0.10.113)
runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.3.status (str) = joined
… or the quorum information.
root@node1:~ # corosync-quorumtool
Quorum information
------------------
Date: Wed Sep 2 21:00:38 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 1
Ring ID: 1/12
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
1 1 10.0.10.111 (local)
2 1 10.0.10.112
3 1 10.0.10.113
The Corosync log file is filled with the following information.
root@node1:~ # cat /var/log/cluster/corosync.log
Sep 02 20:55:35 [1694] node1 corosync notice [MAIN ] Corosync Cluster Engine ('2.4.5'): started and ready to provide service.
Sep 02 20:55:35 [1694] node1 corosync info [MAIN ] Corosync built-in features:
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Sep 02 20:55:35 [1694] node1 corosync warning [MAIN ] Please migrate config file to nodelist.
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] Initializing transport (UDP/IP Unicast).
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha256
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] The network interface [10.0.10.111] is now up.
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync configuration map access [0]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: cmap
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync configuration service [1]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: cfg
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: cpg
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync profile loading service [4]
Sep 02 20:55:35 [1694] node1 corosync notice [QUORUM] Using quorum provider corosync_votequorum
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: votequorum
Sep 02 20:55:35 [1694] node1 corosync notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Sep 02 20:55:35 [1694] node1 corosync info [QB ] server name: quorum
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] adding new UDPU member {10.0.10.111}
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] adding new UDPU member {10.0.10.112}
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] adding new UDPU member {10.0.10.113}
Sep 02 20:55:35 [1694] node1 corosync notice [TOTEM ] A new membership (10.0.10.111:4) was formed. Members joined: 1
Sep 02 20:55:35 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:55:35 [1694] node1 corosync notice [QUORUM] Members[1]: 1
Sep 02 20:55:35 [1694] node1 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
Sep 02 20:58:14 [1694] node1 corosync notice [TOTEM ] A new membership (10.0.10.111:8) was formed. Members joined: 2
Sep 02 20:58:14 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:14 [1694] node1 corosync notice [QUORUM] This node is within the primary component and will provide service.
Sep 02 20:58:14 [1694] node1 corosync notice [QUORUM] Members[2]: 1 2
Sep 02 20:58:14 [1694] node1 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
Sep 02 20:58:19 [1694] node1 corosync notice [TOTEM ] A new membership (10.0.10.111:12) was formed. Members joined: 3
Sep 02 20:58:19 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync warning [CPG ] downlist left_list: 0 received
Sep 02 20:58:19 [1694] node1 corosync notice [QUORUM] Members[3]: 1 2 3
Sep 02 20:58:19 [1694] node1 corosync notice [MAIN ] Completed service synchronization, ready to provide service.
Here is the configuration.
root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.0.4-2deceaa3ae \
cluster-infrastructure=corosync
As we will not be configuring the STONITH mechanism we will disable it.
root@node1:~ # crm configure property stonith-enabled=false
New configuraion with STONITH disabled.
root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.0.4-2deceaa3ae \
cluster-infrastructure=corosync \
stonith-enabled=false
The STONITH configuration is out of scope of this article but properly configured STONITH looks like that.

First Service
We will now configure our first highly available service – a classic – a floating IP address

root@node1:~ # crm configure primitive IP ocf:heartbeat:IPaddr2 params ip=10.0.10.200 cidr_netmask="24" op monitor interval="30s"
Lets check how it behaves.
root@node1:~ # crm configure show
node 1: node1
node 2: node2
node 3: node3
primitive IP IPaddr2 \
params ip=10.0.10.200 cidr_netmask=24 \
op monitor interval=30s
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=2.0.4-2deceaa3ae \
cluster-infrastructure=corosync \
stonith-enabled=false
Looks good – lets check the cluster status.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:03:35 2020
* Last change: Wed Sep 2 22:02:53 2020 by root via cibadmin on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf::heartbeat:IPaddr2): Stopped
Failed Resource Actions:
* IP_monitor_0 on node3 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=132ms
* IP_monitor_0 on node2 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:54Z', queued=0ms, exec=120ms
* IP_monitor_0 on node1 'not installed' (5): call=5, status='complete', exitreason='Setup problem: couldn't find command: ip', last-rc-change='2020-09-02 22:02:53Z', queued=0ms, exec=110ms
Crap. Linuxism. The ip(8) command is expected to be present in the system. This is FreeBSD and as any UNIX system it comes with ifconfig(8) command instead.
We will have to figure something else. For now we will delete our useless IP service.
root@node1:~ # crm configure delete IP
Status after deletion.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:04:34 2020
* Last change: Wed Sep 2 22:04:31 2020 by root via cibadmin on node1
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* No resources
Custom Resource
Lets check what resources are available by stock Pacemaker installation.
root@node1:~ # ls -l /usr/local/lib/ocf/resource.d/pacemaker
total 144
-r-xr-xr-x 1 root wheel 7484 Aug 29 01:22 ClusterMon
-r-xr-xr-x 1 root wheel 9432 Aug 29 01:22 Dummy
-r-xr-xr-x 1 root wheel 5256 Aug 29 01:22 HealthCPU
-r-xr-xr-x 1 root wheel 5342 Aug 29 01:22 HealthIOWait
-r-xr-xr-x 1 root wheel 9450 Aug 29 01:22 HealthSMART
-r-xr-xr-x 1 root wheel 6186 Aug 29 01:22 Stateful
-r-xr-xr-x 1 root wheel 11370 Aug 29 01:22 SysInfo
-r-xr-xr-x 1 root wheel 5856 Aug 29 01:22 SystemHealth
-r-xr-xr-x 1 root wheel 7382 Aug 29 01:22 attribute
-r-xr-xr-x 1 root wheel 7854 Aug 29 01:22 controld
-r-xr-xr-x 1 root wheel 16134 Aug 29 01:22 ifspeed
-r-xr-xr-x 1 root wheel 11040 Aug 29 01:22 o2cb
-r-xr-xr-x 1 root wheel 11696 Aug 29 01:22 ping
-r-xr-xr-x 1 root wheel 6356 Aug 29 01:22 pingd
-r-xr-xr-x 1 root wheel 3702 Aug 29 01:22 remote
Not many … we will try to modify the Dummy service into an IP changer on FreeBSD.
root@node1:~ # cp /usr/local/lib/ocf/resource.d/pacemaker/Dummy /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
/* LOTS OF TYPING */
Because of the WordPress blogging system limitations I am forced to post this ifconfig resource as an image … but fear not – the text version is also available here – ifconfig.odt – for download.
Also the first version did not went that well …
root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* rc=3: Your agent has too restrictive permissions: should be 755
-:1: parser error : Start tag expected, '<' not found
usage: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig {start|stop|monitor}
^
* rc=1: Your agent produces meta-data which does not conform to ra-api-1.dtd
* rc=3: Your agent does not support the meta-data action
* rc=3: Your agent does not support the validate-all action
* rc=0: Monitoring a stopped resource should return 7
* rc=0: The initial probe for a stopped resource should return 7 or 5 even if all binaries are missing
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* Your agent does not support the reload action (optional)
Tests failed: /usr/local/lib/ocf/resource.d/pacemaker/ifconfig failed 9 tests
But after adding 755 mode to it and making several (hundred) changes it become usable.
root@node1:~ # vi /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
/* LOTS OF NERVOUS TYPING */
root@node1:~ # chmod 755 /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
root@node1:~ # setenv OCF_ROOT /usr/local/lib/ocf
root@node1:~ # ocf-tester -n resourcename /usr/local/lib/ocf/resource.d/pacemaker/ifconfig
Beginning tests for /usr/local/lib/ocf/resource.d/pacemaker/ifconfig...
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* Your agent does not support the reload action (optional)
/usr/local/lib/ocf/resource.d/pacemaker/ifconfig passed all tests
Looks usable.
The ifconfig resource. Its pretty limited and with hardcoded IP address as for now.

Lets try to add new IP resource to our FreeBSD cluster.
Tests
root@node1:~ # crm configure primitive IP ocf

Added.
Lets see what status command now shows.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:44:52 2020
* Last change: Wed Sep 2 22:44:44 2020 by root via cibadmin on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:

Failed Resource Actions:
* IP_monitor_0 on node3 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:52Z', queued=0ms, exec=5ms
* IP_monitor_0 on node2 'not installed' (5): call=24, status='Not installed', exitreason='', last-rc-change='2020-09-02 22:42:53Z', queued=0ms, exec=2ms
Crap. I forgot to copy this new ifconfig resource to the other nodes. Lets fix that now.
root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node2:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node2:
sending incremental file list
./
ifconfig
sent 3,798 bytes received 38 bytes 1,534.40 bytes/sec
total size is 128,003 speedup is 33.37
root@node1:~ # rsync -av /usr/local/lib/ocf/resource.d/pacemaker/ node3:/usr/local/lib/ocf/resource.d/pacemaker/
Password for root@node3:
sending incremental file list
./
ifconfig
sent 3,798 bytes received 38 bytes 1,534.40 bytes/sec
total size is 128,003 speedup is 33.37
Lets stop, delete and re-add our precious resource now.
root@node1:~ # crm resource stop IP
root@node1:~ # crm configure delete IP
root@node1:~ # crm configure primitive IP ocf

Fingers crossed.
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:45:46 2020
* Last change: Wed Sep 2 22:45:43 2020 by root via cibadmin on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:

Looks like running properly.
Lets verify that its really up where it should be.
root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:2a:78:60
inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
root@node2:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:80:50:05
inet 10.0.10.112 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:74:5e:b9
inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
Seems to be working.
Now lets try to move it to the other node in the cluster.
root@node1:~ # crm resource move IP node3
INFO: Move constraint created for IP to node3
root@node1:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:47:31 2020
* Last change: Wed Sep 2 22:47:28 2020 by root via crm_resource on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:

Switched properly to node3 system.
root@node3:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:74:5e:b9
inet 10.0.10.113 netmask 0xffffff00 broadcast 10.0.10.255
inet 10.0.10.200 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
root@node1:~ # ifconfig em0
em0: flags=8843 metric 0 mtu 1500
options=81009b
ether 08:00:27:2a:78:60
inet 10.0.10.111 netmask 0xffffff00 broadcast 10.0.10.255
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=29
Now we will poweroff the node3 system to check it that IP is really highly available.
root@node2:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:49:57 2020
* Last change: Wed Sep 2 22:47:29 2020 by root via crm_resource on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 node3 ]
Full List of Resources:
* IP (ocf:

root@node3:~ # poweroff
root@node2:~ # crm status
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.4-2deceaa3ae) - partition with quorum
* Last updated: Wed Sep 2 22:50:16 2020
* Last change: Wed Sep 2 22:47:29 2020 by root via crm_resource on node1
* 3 nodes configured
* 1 resource instance configured
Node List:
* Online: [ node1 node2 ]
* OFFLINE: [ node3 ]
Full List of Resources:
* IP (ocf:

Seems that failover went well.
The crm command also colors various sections of its output.

Good to know that Pacemaker and Corosync cluster runs well on FreeBSD.
Some work is needed to write the needed resource files but one with some time and determination can surely put FreeBSD into a very capable highly available cluster.
EOF
Continue reading...