I have a two-host routing cluster set up running pf, pfsync and carp on 9.0-RELEASE-p3. The two hosts are connected by a cross-over cable that is configured as the pfsync syncdev. The setup is working well, with the first host (host A) as the master (advskew 0) and the second host (host B) as the backup (advskew 100). Connections established while host A is running are copied over to host B, and the connections stay active when host A is shut down or rebooted and host B becomes the master.
The problem occurs when host A is started up again. When it boots, it once again becomes the master, it requests and receives a bulk update of the pf states from host B, the update is successful, but it doesn't get any of the states that were initially created on host A, only new states created on host B. Once host A has come back up, connections that were active before host A went down will break.
For example, while host A is master, make an SSH connection through the router. The states show up on both A and B:
Shut down host A, SSH connection remains active through host B. While host A is down, make a second SSH connection through the router. The new states show up on B:
Start up host A again, it becomes active but only the states for the second connection started while it was down come up on host A:
At this point, only the second SSH connection will work.
It seems like the bulk update process is perhaps assuming that the states from A still exist on A after the reboot, and so not include them, but because they aren't, the states never reappear on host A and the connection is lost.
Is this working as designed, or do other people have the same experience? Thanks in advance for any advice or pointers.
Following are the relevant configs:
Kernel config:
Host A rc.conf:
Host B rc.conf:
pf.conf:
The problem occurs when host A is started up again. When it boots, it once again becomes the master, it requests and receives a bulk update of the pf states from host B, the update is successful, but it doesn't get any of the states that were initially created on host A, only new states created on host B. Once host A has come back up, connections that were active before host A went down will break.
For example, while host A is master, make an SSH connection through the router. The states show up on both A and B:
Code:
all tcp 10.4.1.4:22 <- 10.2.1.4:49303 ESTABLISHED:ESTABLISHED
[700875095 + 22144] wscale 9 [1344039155 + 8192] wscale 7
age 00:00:11, expires in 23:59:52, 23:33 pkts, 4140:4909 bytes, anchor 197, rule 8
id: 4ff4def10000114f creatorid: 76e4b844
all tcp 10.2.1.4:49303 -> 10.4.1.4:22 ESTABLISHED:ESTABLISHED
[1344039155 + 8192] wscale 7 [700875095 + 22144] wscale 9
age 00:00:11, expires in 23:59:52, 23:33 pkts, 4140:4909 bytes, rule 212
id: 4ff4def100001150 creatorid: 76e4b844
Shut down host A, SSH connection remains active through host B. While host A is down, make a second SSH connection through the router. The new states show up on B:
Code:
all tcp 10.4.1.4:22 <- 10.2.1.4:49303 ESTABLISHED:ESTABLISHED
[700875191 + 22144] wscale 9 [1344039203 + 8192] wscale 7
age 00:03:38, expires in 23:59:39, 3:3 pkts, 204:252 bytes
id: 4ff4def10000114f creatorid: 76e4b844
all tcp 10.2.1.4:49303 -> 10.4.1.4:22 ESTABLISHED:ESTABLISHED
[1344039203 + 8192] wscale 7 [700875191 + 22144] wscale 9
age 00:03:38, expires in 23:59:39, 3:3 pkts, 204:252 bytes
id: 4ff4def100001150 creatorid: 76e4b844
all tcp 10.4.1.4:22 <- 10.2.1.4:49323 ESTABLISHED:ESTABLISHED
[1512494642 + 22144] wscale 9 [11200697 + 8192] wscale 7
age 00:00:08, expires in 23:59:55, 21:32 pkts, 4036:4825 bytes, anchor 183, rule 8
id: 4ff4e7c40000025b creatorid: 648d37f6
all tcp 10.2.1.4:49323 -> 10.4.1.4:22 ESTABLISHED:ESTABLISHED
[11200697 + 8192] wscale 7 [1512494642 + 22144] wscale 9
age 00:00:08, expires in 23:59:55, 21:32 pkts, 4036:4825 bytes, rule 212
id: 4ff4e7c40000025c creatorid: 648d37f6
Start up host A again, it becomes active but only the states for the second connection started while it was down come up on host A:
Code:
all tcp 10.4.1.4:22 <- 10.2.1.4:49323 ESTABLISHED:ESTABLISHED
[1512495026 + 22144] wscale 9 [11200889 + 8192] wscale 7
age 00:07:29, expires in 23:59:53, 2:3 pkts, 152:252 bytes
id: 4ff4e7c40000025b creatorid: 648d37f6
all tcp 10.2.1.4:49323 -> 10.4.1.4:22 ESTABLISHED:ESTABLISHED
[11200889 + 8192] wscale 7 [1512495026 + 22144] wscale 9
age 00:07:29, expires in 23:59:53, 2:3 pkts, 152:252 bytes
id: 4ff4e7c40000025c creatorid: 648d37f6
At this point, only the second SSH connection will work.
It seems like the bulk update process is perhaps assuming that the states from A still exist on A after the reboot, and so not include them, but because they aren't, the states never reappear on host A and the connection is lost.
Is this working as designed, or do other people have the same experience? Thanks in advance for any advice or pointers.
Following are the relevant configs:
Kernel config:
Code:
device pf
device pfsync
device pflog
device carp
Host A rc.conf:
Code:
ifconfig_bce0="up"
ifconfig_bce1="192.168.42.1/30"
cloned_interfaces="vlan2 vlan4"
ifconfig_vlan2="inet 10.2.0.1 netmask 255.255.0.0 vlan 2 vlandev bce0"
ifconfig_vlan4="inet 10.4.0.1 netmask 255.255.0.0 vlan 4 vlandev bce0"
ifconfig_carp2="vhid 2 pass 12345678 10.2.0.10/16"
ifconfig_carp4="vhid 4 pass 12345678 10.4.0.10/16"
pf_enable="YES"
pflog_enable="YES"
pfsync_enable="YES"
pfsync_syncdev="bce1"
pfsync_syncpeer="192.168.42.2"
Host B rc.conf:
Code:
ifconfig_bce0="up"
ifconfig_bce1="192.168.42.2/30"
cloned_interfaces="vlan2 vlan4"
ifconfig_vlan2="inet 10.2.0.2 netmask 255.255.0.0 vlan 2 vlandev bce0"
ifconfig_vlan4="inet 10.4.0.2 netmask 255.255.0.0 vlan 4 vlandev bce0"
ifconfig_carp2="vhid 2 advskew 100 pass 12345678 10.2.0.10/16"
ifconfig_carp4="vhid 4 advskew 100 pass 12345678 10.4.0.10/16"
pf_enable="YES"
pflog_enable="YES"
pfsync_enable="YES"
pfsync_syncdev="bce1"
pfsync_syncpeer="192.168.42.1"
pf.conf:
Code:
set skip on lo0
set skip on bce1
set block-policy drop
scrub in all
block log
pass quick proto carp keep state (no-sync)
pass in quick proto tcp from vlan2:network to vlan4:network port 22
pass out quick