Solved CARP help needed please!

Hello everyone, this is my first post here but have been on the FreeNAS forums for a bit now. I am looking at moving to pure FreeBSD as FreeNAS has some limits and I don't like the direction the project is going. I need a rock solid storage box without frills like an AD controller or bhyve.

I am extremely green to fault tolerant storage and I am trying to cobble something together based on shared storage. This would be served over iSCSI and/or Fibre Channel. At this point I'm just trying to get CARP working. It seems like it should be quite simple but both hosts ctlr-a and ctlr-b are stuck in
"Backup". I could use a second set of eyes to look over my config. Note, I am not using preempt as I want to implement hold down timers and flapper limits to prevent an endless loop of flip flopping.

ctlr-a:
Code:
root@ctlr-a:~ # ifconfig vmx0
vmx0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=60039b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:50:56:88:89:ac
        hwaddr 00:50:56:88:89:ac
        inet 192.168.1.243 netmask 0xffffff00 broadcast 192.168.1.255
        inet 192.168.1.245 netmask 0xffffffff broadcast 192.168.1.245 vhid 1
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        carp: BACKUP vhid 1 advbase 1 advskew 1
root@ctlr-a:~ # cat /etc/rc.conf
hostname="ctlr-a"
defaultrouter="192.168.1.1"
ifconfig_vmx0="inet 192.168.1.243/24"
ifconfig_vmx0_alias0="inet vhid 1 pass pa$$w0rd advskew 1 alias
192.168.1.245/24"

sshd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="NO"
zfs_enable="YES"
ctld_enable="yes"

ctlr-b:
Code:
root@ctlr-b:~ # ifconfig vmx0
vmx0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=60039b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,TSO6,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:50:56:88:63:2d
        hwaddr 00:50:56:88:63:2d
        inet 192.168.1.244 netmask 0xffffff00 broadcast 192.168.1.255
        inet 192.168.1.245 netmask 0xffffffff broadcast 192.168.1.245 vhid 1
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        carp: BACKUP vhid 1 advbase 1 advskew 2
root@ctlr-b:~ # cat /etc/rc.conf
hostname="ctlr-b"
defaultrouter="192.168.1.1"
ifconfig_vmx0="inet 192.168.1.244/24"
ifconfig_vmx0_alias0="inet vhid 1 pass pa$$w0rd advskew 2 alias
192.168.1.245/24"

sshd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="NO"
zfs_enable="YES"
ctld_enable="yes"
disabling either hosts network via cable pull or ifconfig vmx0 down seems to have no effect and running tcpdump -T carp | grep CARP shows the CARPv2 advertisements. Both hosts can ping each other and the gateway. As this is running in ESXi 6.7 I have also set the following policies on the port group.
Code:
Security
Promiscuous mode Accept
MAC address changes Accept
Forged transmits Accept
Any insights would be appreciated!
 
Turn on 'promiscuous mode' on the virtual adapters on the VMWare side, that's usually the problem.
 
Thank you for taking the time to respond.
I have enabled promiscuous mode on the portgroup that both VMs are connected to and I see the CARP packets from both hosts on both host as confirmed with tcpdump. Both hosts still seem to be stuck as backup.

Edited for clarity.
 
Both sides need to "see" each other using multicast. Are they both attached to the same network?
 
Both sides need to "see" each other using multicast. Are they both attached to the same network?
Thanks again, Both hosts are on the same subnet, mask, router, vlan, layer 2 with promiscuous enabled on the vSwitch. I have verified that they can see each others traffic by running tcpdump | grep 192.168.1.243 on host b (192.168.1.244) and pinging 192.168.1.243 from my desktop and watching the ICMP packets.

I'll admit that I'm not super sharp on multicast traffic but I would think the hosts would see it if it was on the wire. That is if there looking for it.
 
Here is the output from the tcpdump:
Code:
root@ctlr-b:~ # tcpdump | grep 192.168.1.243
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vmx0, link-type EN10MB (Ethernet), capture size 262144 bytes
07:26:51.348906 IP 192.168.1.244.7777 > 192.168.1.243.46173: Flags [.], ack 1880219400, win 16384, options [nop,nop,TS val 3518210177 ecr 532420], length 0
07:26:51.349312 IP 192.168.1.243.46173 > 192.168.1.244.7777: Flags [.], ack 1, win 16402, options [nop,nop,TS val 533426 ecr 3518209277], length 0
07:26:51.604977 IP 192.168.1.243 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
07:26:51.605216 IP 192.168.1.243 > vrrp.mcast.net: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
07:26:52.350695 IP 192.168.1.243.46173 > 192.168.1.244.7777: Flags [.], ack 1, win 16402, options [nop,nop,TS val 534427 ecr 3518209277], length 0
07:26:52.350745 IP 192.168.1.244.7777 > 192.168.1.243.46173: Flags [.], ack 1, win 16384, options [nop,nop,TS val 3518211179 ecr 533426], length 0
07:26:53.352993 IP 192.168.1.244.7777 > 192.168.1.243.46173: Flags [.], ack 1, win 16384, options [nop,nop,TS val 3518212181 ecr 533426], length 0
07:26:53.353413 IP 192.168.1.243.46173 > 192.168.1.244.7777: Flags [.], ack 1, win 16402, options [nop,nop,TS val 535430 ecr 3518211179], length 0
07:26:53.913573 IP 192.168.1.240 > 192.168.1.243: ICMP echo request, id 1, seq 1, length 40
07:26:53.913841 IP 192.168.1.243 > 192.168.1.240: ICMP echo reply, id 1, seq 1, length 40
As you can see the carp and ICMP traffic come through clean and clear.
Note: I omitted the -T carp option so the carp is decoded as vrrp.
 
I suspect it's something in the vSwitch configuration that's preventing the interfaces from seeing each other. The FreeBSD configuration looks to be in order.
 
Perhaps I'm misunderstanding something but wouldn't a lack of communication cause a split brain and both would assume the master state?
 
I think the rationale here is to fail rather than end up with two masters.
 
If you have multiple physical ports on the same vswitch, see item 4 on this article.
You were right on the money! Its now passing the IP back and forth as expected. I don't think i would have found this in a reasonable amount of time on my own!
Sidenote: Setting Net.ReversePathFwdCheckPromisc = 1 on the ESXi host works on the fly, no reboot required as a few have suggested.
 
You were right on the money! Its now passing the IP back and forth as expected. I don't think i would have found this in a reasonable amount of time on my own!
You're welcome!
:)
Sidenote: Setting Net.ReversePathFwdCheckPromisc = 1 on the ESXi host works on the fly, no reboot required as a few have suggested.
For those environments where carp() may be difficult to deploy, such as in clouds or other virtualized infrastructures where "promiscuous mode" is not allowed and some multicast traffic is blocked, you can use net/ucarp instead.
 
Back
Top