Recently I've been changing our routers/firewalls at work to a redundant configuration, using lagg(4), vlan(4)s, carp(4), pf(4), and pfsync(4). However, I've run into a strange problem and I'm at a loss as to how to troubleshoot it at this point.
We have two routers right now, in a redundant config as follows:
The second router's config (router02) is identical with the following exceptions:
For the most part everything seemed to be working, and I've set up preemption for carp(4) and things look good in ifconfig(8) (from router01):
However, today there started to be issues routing traffic out the x.x.x.133 address. What's really odd is it's almost impossible to troubleshoot. Connecting to the router (from the private address) and running tcpdump(1) on vlan24 shows me no IP traffic at all (but I can see carp traffic), despite it being the master. Trying to ping the default gateway (x.x.x.129) gives me:
This is not related to PF or my PF ruleset as it happens even with PF disabled (
I suspect this is somehow related to arp(8) problems, as I get repeated dmesg(8) spam of:
as if it can't allocate memory for the default gateway's ARP/IP information. This appeared since inception, but things were working before (or at least appeared to). The ARP tables for the other interfaces (vlans) are fine--it's just vlan24's that are empty.
Any idea of where to start?
We have two routers right now, in a redundant config as follows:
Code:
# router01's rc.conf (redacted)
hostname="router01"
dumpdev="NO"
watchdogd_enable="YES"
unbound_enable="YES"
sshd_enable="YES"
openntpd_enable="YES"
dhcpd_enable="YES"
pf_enable="YES"
ifconfig_re0="up"
ifconfig_re1="up"
cloned_interfaces="lagg0 vlan24 vlan64 vlan254 vlan300"
ifconfig_lagg0="laggproto lacp laggport re0 laggport re1"
ifconfig_vlan24="inet x.x.x.131/29 vhid 131 advskew 127 pass pw131 vlan 24 vlandev lagg0"
ifconfig_vlan24_alias0="inet x.x.x.133/32 vhid 133 advskew 0 pass pw133"
ifconfig_vlan64="inet 192.168.64.2/24 vlan 64 vlandev lagg0"
ifconfig_vlan64_alias0="inet 192.168.64.1/32 vhid 64 advskew 0 pass pw64"
ifconfig_vlan254="inet 192.168.254.251/24 vlan 254 vlandev lagg0"
ifconfig_vlan254_alias0="inet 192.168.254.254/32 vhid 254 advskew 127 pass pw254"
ifconfig_vlan300="inet 172.31.255.249/29 vlan 300 vlandev lagg0"
pfsync_enable="YES"
pfsync_syncdev="vlan300"
defaultrouter="x.x.x.129"
gateway_enable="YES"
static_routes="vl224"
route_voice="-net 192.168.224.0/24 192.168.254.2"
- It's using em0/em1 as it has Intel NICs instead of the crummy Realteks.
- Anywhere router01's advskew is 0, its is 127, and vice-versa.
For the most part everything seemed to be working, and I've set up preemption for carp(4) and things look good in ifconfig(8) (from router01):
Code:
re0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:30:48:dc:21:a6
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex,master>)
status: active
re1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:30:48:dc:21:a6
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
pflog0: flags=0<> metric 0 mtu 33160
pfsync0: flags=41<UP,RUNNING> metric 0 mtu 1500
pfsync: syncdev: vlan300 syncpeer: 224.0.0.240 maxupd: 128 defer: off
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:30:48:dc:21:a6
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
laggproto lacp lagghash l2,l3,l4
laggport: re0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: re1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
vlan24: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=1<RXCSUM>
ether 00:30:48:dc:21:a6
inet x.x.x.131 netmask 0xfffffff8 broadcast x.x.x.135 vhid 131
inet x.x.x.133 netmask 0xffffffff broadcast x.x.x.133 vhid 133 [1/1202]
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 24 parent interface: lagg0
carp: BACKUP vhid 131 advbase 1 advskew 127
carp: MASTER vhid 133 advbase 1 advskew 0
vlan64: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=1<RXCSUM>
ether 00:30:48:dc:21:a6
inet 192.168.64.2 netmask 0xffffff00 broadcast 192.168.64.255
inet 192.168.64.1 netmask 0xffffffff broadcast 192.168.64.1 vhid 64
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 64 parent interface: lagg0
carp: MASTER vhid 64 advbase 1 advskew 0
vlan254: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=1<RXCSUM>
ether 00:30:48:dc:21:a6
inet 192.168.254.251 netmask 0xffffff00 broadcast 192.168.254.255
inet 192.168.254.254 netmask 0xffffffff broadcast 192.168.254.254 vhid 254
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 254 parent interface: lagg0
carp: BACKUP vhid 254 advbase 1 advskew 127
vlan300: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=1<RXCSUM>
ether 00:30:48:dc:21:a6
inet 172.31.255.249 netmask 0xfffffff8 broadcast 172.31.255.255
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect
status: active
vlan: 300 parent interface: lagg0
However, today there started to be issues routing traffic out the x.x.x.133 address. What's really odd is it's almost impossible to troubleshoot. Connecting to the router (from the private address) and running tcpdump(1) on vlan24 shows me no IP traffic at all (but I can see carp traffic), despite it being the master. Trying to ping the default gateway (x.x.x.129) gives me:
Code:
ping: sendto: Invalid argument
pfctl -d).I suspect this is somehow related to arp(8) problems, as I get repeated dmesg(8) spam of:
Code:
arpresolve: can't allocate llinfo for x.x.x.129 on vlan24
Any idea of where to start?