Hello,

I'm trying to build a PF active-active HA cluster which consists of 2 FreeBSD hosts (FW1 and FW2) with dynamic routing protocol (BGP in this case).
If there's a symmetrical flow (entering and returning through the same firewall) - everything is working fine, session is established and being replicated on the peer FW via pfsync0.

The problem I'm facing is with asymmetrical traffic flows like the below one:

Code:
(1) Client -------TCP SYN ---------> FW1 -------------------------> Server
                                      |                                  
                                     pfsync                          
                                      |                                  
(2) Client <------------------------ FW2 <----- TCP SYN+ACK----- Server

FW2 denies TCP segment with SYN+ACK flags set (2) sent from Server in response to the Client's TCP SYN request in (1).

The returned SYN+ACK segment (from Server to Client) is dropped by FW2, because it hasn't seen SYN-SENT session from FW1, yet.
TCP SYN+ACK segment comes just before the state is replicated from FW1 to FW2.

I've read that such kind of setup should be supported with PFSYNCv5 protocol and with "defer" option enabled on pfsync0 interface on both FWs, which will basically queue the initial SYN packet for a while until the SYN SENT state is replicated from FW1 to FW2. In my case I don't see any changes in the behavior with or without defer option enabled (ifconfig pfsync0 correctly displays whether or not defer is on)
I also tried to set "maxups" parameter to the minimum possible value=1 on pfsync0 interface on both FW1 and FW2, but this didn't help, either (tcpdump indicates that pfsync packets are sent much faster than before, which makes sense of course, because FWs are not waiting for several changes to be combined into a single pfsync packet)

Moreover I noticed that in FreeBSD's man pfsync(4) "defer" keyword is not mentioned anywhere unlike in OpenBSD's man pfsync (4) documentation and I assume there might be a reason for that? Can someone confirm whether or not this feature is working in FreeBSD?

Regards,
Plamen
 
Moreover I noticed that in FreeBSD's man pfsync(4) "defer" keyword is not mentioned anywhere unlike in OpenBSD's man pfsync (4) documentation and I assume there might be a reason for that?
Keep in mind that FreeBSD's PF is based on an older version of PF from OpenBSD. If I recall correctly it was OpenBSD 4.5. It's not in sync with the latest PF of OpenBSD. FreeBSD's implementation has diverged so much it's not possible anymore (at least not without considerable effort) to import the current PF from OpenBSD.
 
The syntax is probably different.

Code:
The pfsync interface will attempt to collapse multiple state updates into
     a single packet where possible.  The maximum number of times a single
     state can be updated before a pfsync packet will be sent out is
     controlled by the maxupd parameter to ifconfig (see ifconfig(8) and the
     example below for more details).  The sending out of a pfsync packet will
     be delayed by a maximum of one second.

I would guess it's maxupd and not defer. But I am guessing.

My FreeBSD ruleset will run on an OpenBSD box with a syntax change to the outbound rule:

Code:
### FreeBSD - Keep and modulate state of outbound tcp, udp and icmp traffic
pass out on $ext_if proto { tcp, udp, icmp } from any to any modulate state

### OpenBSD - Keep and modulate state of outbound tcp, udp and icmp traffic
pass out on egress proto { tcp, udp, icmp } from any to any modulate state
 
I think your topology should look like this:
View attachment 9515
I absolutely agree with the point to have a clear separation between pure L3 routing and L3/L4 filtering roles and dedicated layers for access/distribution/core/edge/etc. This would make my life much easier (also from troubleshooting & operations perspective in future, where a single mistake of a firewall rule may have an unpredictable impact), unfortunately in my case all those roles are merged.
The main 2 purposes for having active-active topology are:
1) To share the traffic load up to some extent (and probably to be also useful for future scale out instead of scale up)
2) To minimize the impact in case of one of these nodes fails (time it will take for the control plane to detect, propagate, re-calculate and switch to the new path)
 
Keep in mind that FreeBSD's PF is based on an older version of PF from OpenBSD. If I recall correctly it was OpenBSD 4.5. It's not in sync with the latest PF of OpenBSD. FreeBSD's implementation has diverged so much it's not possible anymore (at least not without considerable effort) to import the current PF from OpenBSD.
I found an old article (from 2009) - http://undeadly.org/cgi?action=article;sid=20090619100514
So 10+ years later I was hoping that feature is already in FreeBSD code (and there is a "defer" option in ifconfig which I can turn it on and off on pfsync interface)
 
Yeah, I don't think this will work without a CARP VIP. Even then I'm not sure it will work because the Freebsd CARP(4) implementation does not appear to support any kind of load balancing. Contrast with https://man.openbsd.org/carp
Agree with that. The only way I can think of to achieve load-sharing with FreeBSD implementation of CARP is to have 2 different groups with 2 different VIPs on the same L2 segment and to distribute these 2 VIPs to the end hosts (for instance - host1,3,5,7... use VIP1 as default gateway, host2/4/6/8... - VIP2).
But in all cases - the problem with the asymmetrical flow remains.
 
What about - "net.pfsync.pfsync_buckets"
Based on the man:
net.pfsync.pfsync_buckets
The number of pfsync buckets. This affects the performance
and memory tradeoff. Defaults to twice the number of CPUs.
Change only if benchmarks show this helps on your workload.
Does anyone have an idea what's all about?
 
Agree with that. The only way I can think of to achieve load-sharing with FreeBSD implementation of CARP is to have 2 different groups with 2 different VIPs on the same L2 segment and to distribute these 2 VIPs to the end hosts (for instance - host1,3,5,7... use VIP1 as default gateway, host2/4/6/8... - VIP2).
But in all cases - the problem with the asymmetrical flow remains.
That's what I've done in the past, too. N VIPs with n hosts. Each host is primary for one VIP, and backup for at least one VIP.
 
What about - "net.pfsync.pfsync_buckets"
Based on the man:

Does anyone have an idea what's all about?
That's strictly a performance related setting. As the man page says: it trades memory for performance. Do not play with that unless you're going to put in the work to benchmark things for your setup.

On topic: I've never tried active-active pfsync. It might work. It might not. I don't know. It's almost certainly going to be slower than a single box handling the traffic, or an active-passive setup. Since the buckets change pfsync scales pretty decently. It's basically within 10% of pf without pfsync now.
 
Found an ugly workaround and would like to ask for comments.

Based on the attached diagram (just for illustration purposes, I simplified it as much as possible, added CARP VIP on both interfaces, intentionally made the flow asymmetrical).
Because the SYN+ACK TCP segments reaches the other firewall before pfsync is able to synchronize the state to it, I'm basically temporarily allowing (last match) all TCP (allowing only SYN+ACK is not enough, because sometimes it takes even longer for pfsync to replicate the state) without keeping state.
So that obviously opens a huge hole in the firewall logic, so in order to secure it (up to certain extent) I'm denying all TCPs with SYN flag set.
So in the example based on the attached topology - the goal is to allow SSH from 2.2.2.2 to 1.1.1.1 and allow telnet from 1.1.1.1 to 2.2.2.2 while denying anything else.

My rulebase for that is:
Code:
# Allow anything exiting the firewalls and don’t care about SEQ/ACK numbers (keep sloppy state)
pass out log quick inet all flags S/SA keep state (sloppy) allow-opts

#
#  em2.vlan110 rules
#
# Temporarily allow return traffic from asymmetric TCP session (for example SYN+ACK from 3way handshake and few more)
# that hasn’t been synchronized yet on em2.vlan110 and don’t create state for it.
# When TCP state is finally synchronized between both FWs that rule won’t process traffic anymore.
#
pass in log on em2_vlan110 inet proto tcp all no state
#
# Permit SSH from 2.2.2.2 to 1.1.1.1 and keep sloppy state
#
pass in log quick on em2_vlan110 inet proto tcp from 2.2.2.2 to 1.1.1.1 port = ssh flags S/SA keep state (sloppy)
#
# Block anything else with TCP SYN flag set received in that interface
#
block drop in log quick on em2_vlan110 inet proto tcp all flags S/SA

#
#  em2.vlan120 rules
#
# Temporarily allow return traffic from asymmetric TCP session (for example SYN+ACK from 3way handshake and few more)
# that hasn’t been synchronized yet on em2.vlan110 and don’t create state for it.
# When TCP state is finally synchronized between both FWs that rule won’t process traffic anymore.
#
pass in log on em2_vlan120 inet proto tcp all no state
#
# Permit Telnet from 1.1.1.1 to 2.2.2.2:23 and keep sloppy state
#
pass in log quick on em2_vlan120 inet proto tcp from 1.1.1.1 to 2.2.2.2 port = telnet flags S/SA keep state (sloppy)
#
# Block anything else with TCP SYN flag set received in that interface
#
block drop in log quick on em2_vlan120 inet proto tcp all flags S/SA

Above seems to work, but again would be happy to optimize it if possible.

Regards,
Plamen
 

Attachments

  • PF-Active-Active-HA.png
    PF-Active-Active-HA.png
    100.4 KB · Views: 210
Back
Top