PF How BSD pf performance depend on CPU frequency, L2/L3 cache size

Hi, hardware gurus!

How exactly BSD pf performance (in matter of low latency, high PPS, etc) depend on bus frequency, main CPU frequency and L2/L3 cache size in multi-package (mean physical multi-CPU, like Intel E5500/5600, E5-2000 series) server systems that intend working as border firewall with a lot of double/quad PCIe 2.0+ NICs ?

Note, we not speak about using firewall in VMs, only bare metal + FreeBSD.

THANK You for attention and much detailed answers!
 
I'm not aware of anyone having done a benchmarking series across different CPUs, so there's not really any information there.

It may be more useful for you to tell us what sort of performance your application needs.

To give you an idea, the latest version of FreeBSD pf can push around 18Mpps (in simple setups) on my particular hardware configuration.
 
I'm not aware of anyone having done a benchmarking series across different CPUs, so there's not really any information there.
Or may be because this is property of some IT company. ;)

It may be more useful for you to tell us what sort of performance your application needs.
On moment now, interesting
1. pfSense (lets name it “GUI for pf + some other stuff”)
2. HAProxy

Speeds 10-30G, copper Eth.
Intel CPUs

To give you an idea, the latest version of FreeBSD pf can push around 18Mpps (in simple setups) on my particular hardware configuration.
Please could You describe Your setup (of course just drop parts about security). Thanks!
 
On moment now, interesting
1. pfSense (lets name it “GUI for pf + some other stuff”)
You'll want to talk to Netgate about that. They have more performance numbers than I do.

2. HAProxy

Speeds 10-30G, copper Eth.
Intel CPUs
In 64-byte packets? 1500 bytes? 9000 bytes?

Please could You describe Your setup (of course just drop parts about security). Thanks!
It's a development setup. It's entirely artificial.
 
You'll want to talk to Netgate about that. They have more performance numbers than I do.
Fellows from Netgate user forum not agree with that: most of them keep pfSense-based firewalls for small net (hotel, campus, 10-20 people’s office, etc...), fall in love to GUI, and not any motivation to goes down into bus, cpu, NIC, and FreeBSD network stack working...
So, best of them just redirect me on FreeBSD user forum or the same Calimet articles.

In 64-byte packets? 1500 bytes? 9000 bytes?
1500

It's a development setup. It's entirely artificial.
May be interesting anyway. ;)

Thank You!
 
Fellows from Netgate user forum not agree with that: most of them keep pfSense-based firewalls for small net (hotel, campus, 10-20 people’s office, etc...), fall in love to GUI, and not any motivation to goes down into bus, cpu, NIC, and FreeBSD network stack working...
So, best of them just redirect me on FreeBSD user forum or the same Calimet articles.
They are mistaken. I contract for NetGate (but explicitly do not speak for them). They have a team dedicated to performance testing. Talk to them, they can help.
Many users use the free (community edition, CE) version, but Netgate has commercial offerings, including support, that are suitable for business use at larger scale than the examples you list here.

May be interesting anyway. ;)
That setup uses Dell T640s (Intel Xeon Silver 4210 2.2G, 10C, 32GB RAM) with Chelsio T62100 NICs. I stress that this is a development setup and not a production system with real traffic.
 

Some sub-question (not needed to create separate tread because question are too close to this tread):

How EXACTLY the technologies:

Intel SpeedStep
Multi-Threads
CPU Enhanced Halt (C1E)
C3/C6/C7 State Support
CPU EIST Function
AHCI
QPI Frequency
Memory Scrubs

impact on FreeBSD performance as a 24/7 Firewall (pf / ipfw) ?

P.S. For example, I definitely know that C3 on 1-thread application (like pf) cause going to sleep other cores and small speed up the 1-st core.
 
IMHO, knowing how exactly a system will perform (compute + pf in this case) depends on some many factors that it would be impossible to guess. When companies like Netgate give you specs on how their appliances perform is because they have a discrete and consistent set of HW and variables to perform tests on and even then, you can only hope that their “real traffic” (that is short/long lived flows, udp traffic, tcp optimizations based on OSs, MTU, applications, etc) and setup (number of rules, options, clients behind it) matches your real traffic and setup. If your hardware deviates from what it has been tested, your best approach would be test it yourself and if you don’t have the hardware yet, look for approximate HW to other tests and extrapolate. Cheers.
 
IMHO, knowing how exactly a system will perform (compute + pf in this case) depends on some many factors that it would be impossible to guess. When companies like Netgate give you specs on how their appliances perform is because they have a discrete and consistent set of HW and variables to perform tests on and even then, you can only hope that their “real traffic” (that is short/long lived flows, udp traffic, tcp optimizations based on OSs, MTU, applications, etc) and setup (number of rules, options, clients behind it) matches your real traffic and setup. If your hardware deviates from what it has been tested, your best approach would be test it yourself and if you don’t have the hardware yet, look for approximate HW to other tests and extrapolate. Cheers.

In common, I agree with You and other here.

So, at the some point of discussions like this, each time we conclude to “play” on a real hardware in a lab environment.

Because we speak about only one use case - firewalling, the test we really need is to simulate “real traffics”, change the settings (BIOS/UEFI) and make measurement.
(I write «BIOS/UEFI” because initial message in this thread was about hardware technologies: power management (ACPI, etc) , memory management (fast scrub, patrol scrub, mirrored ram banks, etc), CPU proprietary functionality (SpeedStep, Multi-Threads, etc...)

What set of software You (and May be some others) suggest for this sort of testing?

iperf/iperf3, rdmon, ping, what else?
 
iperf/iperf3, rdmon, ping, what else?
Can you invite the IT and marketing departments for a ping party? :D. Seriously tho, iperf, and similar tools are good to test throughput and latency for a small number of flows but to simulate a real-world deployment, you will need something that will generate hundreds of flows per client so it can stress the CPU, memory and NICs. I seem to remember reading in this forum that pf is not multi-threaded so I would also research if multiple cores and threads will make a difference or if just a low number of cores and high clock will do the trick.

I haven't tested any of these tools but it looks like there are several tools like Seagul (http://gull.sourceforge.net/index.html) that can help.
 

Thank You again for this suggestions.

But this articles looks like a little bit outdated:
1. This is some sort of synthetic “lab” setup, but in real life the network flow have more “mobile device” character: lot of packet loss and latency;
2. Due of 1., modern Congestion Control (CC) algorithms coming: BBR2 (already implemented in FreeBSD 13.X), QUIC, RACK...

So, tuning FreeBSD network stack by primary tuning NIC buffers/queue going to be outdated, - most work placed on drivers/kernel drivers.
(Because NIC developer are not so fast to implement modern CC “directly in silicon”, next 2-4 years we see implementation only on OS level)

If I miss something?
 
Back
Top