Hi,
I've got 2 server (Proxmox and FreeNAS) and both of them should run a OPNsense VM so I could run them in HA mode.
Both OPNsense VMs are running fine and everything works as long as only one at the time is running.
To be able to use HA these VMs need to use CARP to communicate but here is the problem, because CARP packages are getting lost between my lagg0 (using tagged vlan) interface and vlan2/vlan42/vlan43 interface (see the red line in the picture below). The other direction is working fine. CARP packages from vlan2/vlan42/vlan43 can pass to lagg0.
This is how my setup looks like:
I used tcpdump and wireshark to analyse the packets.
All unicast packets are working fine between both VMs.
CARP is working fine from the VM on FreeNAS down to the VM on Proxmox (I can see CARP packets from both VMs if I listen on "vmbr2").
If I listen on "bridge2" or "vlan2" I only see CARP packets from the VM on the FreeNAS host.
If I listen on "lagg0" I see CARP packets of both VMs.
So it really looks like the CARP packets of the VM on Proxmox get lost while traveling between "lagg0" and "vlan2" on the FreeNAS host.
I also checked the header of the CARP packets captured on "lagg0" and verified that these are tagged with vlan id 2.
I have no idea why all other traffic with vlan id 2 passed "lagg0" to "vlan2" but not the CARP packets.
Tcpdump vlan2:
Tcpdump lagg0:
Tcpdump vmbr2:
Does anyone has a idea?
I'm trying to fix this for days and no one in the OPNsense, FreeNAS or Proxmox Forums has any idea why the CARP packets get lost.
I can post tcpdumps, logs and so on if more information is needed.
Its FreeNAS version 11.3U4.1 so its based on FreeBSD 11.3.
I've got 2 server (Proxmox and FreeNAS) and both of them should run a OPNsense VM so I could run them in HA mode.
Both OPNsense VMs are running fine and everything works as long as only one at the time is running.
To be able to use HA these VMs need to use CARP to communicate but here is the problem, because CARP packages are getting lost between my lagg0 (using tagged vlan) interface and vlan2/vlan42/vlan43 interface (see the red line in the picture below). The other direction is working fine. CARP packages from vlan2/vlan42/vlan43 can pass to lagg0.
This is how my setup looks like:
I used tcpdump and wireshark to analyse the packets.
All unicast packets are working fine between both VMs.
CARP is working fine from the VM on FreeNAS down to the VM on Proxmox (I can see CARP packets from both VMs if I listen on "vmbr2").
If I listen on "bridge2" or "vlan2" I only see CARP packets from the VM on the FreeNAS host.
If I listen on "lagg0" I see CARP packets of both VMs.
So it really looks like the CARP packets of the VM on Proxmox get lost while traveling between "lagg0" and "vlan2" on the FreeNAS host.
I also checked the header of the CARP packets captured on "lagg0" and verified that these are tagged with vlan id 2.
I have no idea why all other traffic with vlan id 2 passed "lagg0" to "vlan2" but not the CARP packets.
Tcpdump vlan2:
Code:
tcpdump -B 16000 -i vlan2 -s0 -vv -n | grep VRRPv2
tcpdump: listening on vlan2, link-type EN10MB (Ethernet), capture size 262144 bytes
192.168.0.3 > 224.0.0.18: vrrp 192.168.0.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36, addrs(7): 204.26.121.131,113.231.163.36,111.27.69.51,139.52.213.104,103.208.241.213,47.0.182.224,252.5.96.64
Tcpdump lagg0:
Code:
tcpdump -B 16000 -i lagg0 -s0 -vv -n | grep VRRPv2
tcpdump: listening on lagg0, link-type EN10MB (Ethernet), capture size 262144 bytes
192.168.0.3 > 224.0.0.18: vrrp 192.168.0.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36, addrs(7): 111.100.125.40,62.223.145.20,142.113.126.246,185.154.196.161,103.0.53.151,212.222.211.243,85.170.90.42
192.168.43.3 > 224.0.0.18: vrrp 192.168.43.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 3, prio 100, authtype none, intvl 1s, length 36, addrs(7): 96.66.4.97,132.211.112.69,212.103.17.121,143.149.122.238,4.30.250.155,65.48.107.187,65.63.152.79
192.168.42.3 > 224.0.0.18: vrrp 192.168.42.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 5, prio 100, authtype none, intvl 1s, length 36, addrs(7): 32.109.10.203,100.5.167.83,103.19.183.233,178.227.26.106,226.114.149.25,181.206.12.219,83.59.18.64
192.168.0.2 > 224.0.0.18: vrrp 192.168.0.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36, addrs(7): 111.100.125.40,62.223.145.21,135.66.16.237,211.4.211.42,117.193.93.14,212.76.235.95,104.10.216.75
Tcpdump vmbr2:
Code:
tcpdump -B 16000 -i vmbr2 -s0 -vv -n | grep VRRPv2
tcpdump: listening on vmbr2, link-type EN10MB (Ethernet), capture size 262144 bytes
192.168.0.3 > 224.0.0.18: vrrp 192.168.0.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 100, authtype none, intvl 1s, length 36, addrs(7): 121.117.48.10,56.7.253.87,70.65.213.181,67.105.250.170,222.9.115.236,50.252.143.196,169.224.5.94
192.168.0.2 > 224.0.0.18: vrrp 192.168.0.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36, addrs(7): 121.117.48.10,56.7.253.88,58.219.37.12,15.210.142.91,192.209.83.22,41.242.4.18,137.13.56.178
Code:
ifconfig vlan2
vlan2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN vlan
ether f4:52:14:88:30:60
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: active
vlan: 2 vlanpcp: 0 parent interface: lagg0
groups: vlan
Code:
ifconfig lagg0
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
description: 10G + 1G bond
options=88<VLAN_MTU,VLAN_HWCSUM>
ether f4:52:14:88:30:60
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: active
groups: lagg
laggproto failover lagghash l2,l3,l4
laggport: em0 flags=0<>
laggport: mlxen0 flags=5<MASTER,ACTIVE>
Code:
mlxen0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
description: Bond SFP+
options=8c00a8<VLAN_MTU,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE>
ether f4:52:14:88:30:60
hwaddr f4:52:14:88:30:60
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (10Gbase-CX4 <full-duplex,rxpause,txpause>)
status: active
Does anyone has a idea?
I'm trying to fix this for days and no one in the OPNsense, FreeNAS or Proxmox Forums has any idea why the CARP packets get lost.
I can post tcpdumps, logs and so on if more information is needed.
Its FreeNAS version 11.3U4.1 so its based on FreeBSD 11.3.