How FreeBSD utilize multicore processors and multi-CPU systems?

Sergei_Shablovsky · Feb 13, 2020

Hi, FreeBSD guru!
How FreeBSD utilize multicore processors in ONE CPU systems ?
How FreeBSD utilize multicore processors in multi-CPU systems ?

multix · Apr 10, 2020

I have a Pentium-D which is Dual-Core, it will say:

 FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs

FreeBSD/SMP: 1 package(s) x 2 core(s)

The second CPU is launched:

 SMP: AP CPU #1 Launched!

and everything looks smooth!

ralphbsz · Apr 11, 2020

Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.

This is from my home server:

Code:

FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20171214/tbfadt-748)
ioapic0: Changing APIC ID to 4
ioapic0 <Version 2.0> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!

Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.

PMc · Apr 11, 2020

Sergei_Shablovsky said:
Hi, FreeBSD guru!
How FreeBSD utilize multicore processors in ONE CPU systems ?
How FreeBSD utilize multicore processors in multi-CPU systems ?

With MPS 1.4 (or newer).

Sergei_Shablovsky · May 2, 2020

PMc said:
With MPS 1.4 (or newer).

Thank You for reply!

Specification are great. In the same time we all know how implementation in real hardware (motherboard + cpu) AND software impact on.

Common place the last 10+ years that software development running faster (because of industry rushing, and common data value dramatically increasing) than hardware manufacturer able to Engineering and producing motherboards+CPU.

So, each time when we need a effective and well-balanced solution, we need to “calling to all”: community of users of certain software, community of users of certain operating system, and of course forums/support of hardware manufacturer.

Sergei_Shablovsky · May 2, 2020

ralphbsz said:
Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.

Thank You for kindly reply!

The start topic is only first step.

Because the main question are: how network-focused software (I’m interesting exactly in
1. firewall pfSense solution
2. balancing HAproxy solution
3. FreeNAS storage solution)
working on systems with multi-CPU systems (which support multi-threading and have 4-6-8-12 cores).

This is complex question because each solution have different software architecture, and different loading strategy on CPU, memory and data bus.

What a You think about this?

ralphbsz said:
Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.

Thank You so much! I’l try To find it.

Sergei_Shablovsky · Nov 6, 2020

Could You be so please to comment about manage iflib threads on several CPU cores, the last reply on this thread?

How pfSense utilize multicore processors and multi-CPU systems ?

Hi, pfSense Gurus! Looking on perspective of upgrading to multi-CPU systems we have 2 main question: How pfSense utilize multicore processors in ONE CPU sys...

forum.netgate.com

Sergei_Shablovsky · Dec 16, 2020

Also this post about FreeBSD optimization and tuning for networking for Yours attention https://calomel.org/freebsd_network_tuning.html

Sergei_Shablovsky · Feb 11, 2021

Hm. Looks like hard to find right answer...

I need a little bit to explain the topic start question:

What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
a) 1 CPU with 4-10 cores, hi-frequency
b) 2-4 CPU with 4-6 cores, mid-frequency

And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?

richardtoohey2 · Feb 11, 2021

Sergei_Shablovsky said:
What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...)

I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?

Any also what you are doing with the traffic - pushing it through as fast as possible? Or trying to analyse it and do more than just pushing packets?

If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.

Sergei_Shablovsky · Feb 14, 2021

richardtoohey2 said:
I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?

Because the different type of traffic involve different software chain react to. For example, media streaming packets, VPN sessions and ICMP are very different in processing inside software that are on the top of BSD. Am I wrong?

richardtoohey2 said:
Any also what you are doing with the traffic - pushing it through as fast as possible? Or trying to analyse it and do more than just pushing packets?

From the networking device point of view the main goal are "processing packets w/o errors and as fast as possible".

My questions related more to situation when FreeBSD used as core of FW, VPN gate or balancer on usual Intel-based servers.

richardtoohey2 said:
If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.

The speeds we must talking about are starting from 10-20Gb / s

richardtoohey2 · Feb 14, 2021

Not sure you will get answers to those sorts of questions on these forums.

There's a book:

Design and Implementation of the FreeBSD Operating System, The 2nd Edition

And Netflix work on FreeBSD and networking e.g.

Netflix and FreeBSD: Using Open Source to Deliver Streaming Video :: FreeBSD Presentations and Papers

papers.freebsd.org

Using FreeBSD and commodity parts, we achieve 90 Gb/s serving TLS-encrypted connections with ~55% CPU on a 16-core 2.6-GHz CPU.

Pretty sure there are other Netflix papers on working with FreeBSD and NUMA etc.

Have a look at https://papers.freebsd.org/ e.g.

In-kernel TLS Framing and Encryption for FreeBSD :: FreeBSD Presentations and Papers

papers.freebsd.org

I'm not sure if you are just trying to learn how things work or if you have a specific requirement or issue that you need to fix - maybe if you are more specific then someone can help.

Phishfry · Feb 14, 2021

I know this is a subjective topic but I prefer Single Socket server board.
The second cpu does not bring a linear acceleration. There is a preformance hit for dual cpu.
Witness the synthetic benchmarks.
Single CPU = 11K

Intel Xeon E5-2650L v3 Benchmark

Same CPU dual =18K

[Dual CPU] Intel Xeon E5-2650L v3 Benchmark

But where a dual CPU configuration can help is PCIe lanes. Typical Xeon had 40 Lanes. with 2 CPU that means 80 lanes.
For a setup requiring I/O this can be important. The newer LGA3647 Xeon has 48 Lanes.
AMD EPYC has 128 Lanes.

So the single EPYC/2 will smash most Dual CPU setups.

AMD EPYC 7302P Benchmark

[Dual CPU] AMD EPYC 7302 Benchmark

There are benefits to single CPU. Interprocess communication kept on die is superior.

GoNeFast_01 · Feb 16, 2021

Phishfry said:
So the single EPYC/2 will smash most Dual CPU setups.
PassMark - AMD EPYC 7302P - Price performance comparison PassMark - [Dual CPU] AMD EPYC 7302 - Price performance comparison

And that is why there are DUAL CORE EPYC Boards now....

Enjoying the boundaries being push let's go quantum!!!

Well I have installed FreeBSD on some 8-32 core processors performs relative well.... 4-8% with very heavy media use as desktop environment(100+ tabs of firefox, 40+ chrome, 20+ terminal, 2+ VM) sometimes ram becomes an issue but not CPU on my experience barely breaks 10% ever.

Phishfry · Feb 16, 2021

Yea but did you notice the benchmarks?
Single EPYC=33K
Dual EPYC=40K
Yikes imagine buying 2 chips at $4K each and only getting marginal increase....

So is this a testing flaw? Passmark is a Windows thing so not representative of FreeBSD.
But I do feel that NUMA drags pretty hard.

Intel uses QPI for its core interconnect and it is quick. Going off die is costly.

GoNeFast_01 · Feb 17, 2021

Phishfry said:
Yea but did you notice the benchmarks?
Single EPYC=33K
Dual EPYC=40K
Yikes imagine buying 2 chips at $4K each and only getting marginal increase....

This is a nice catch, and if what the benchmark say is true, I would be MAD as hell

Honestly though, I think that is a software limitation or bottleneck (Maybe WINDOWS? like you mention )... At the end of the day we need to put them in production environment and let these thing bleed. I like to test in real world environment, call me a benchmark skeptic.

I like the VCORE for cloud's future, the cost is being driven to the ground... I mean pretty soon everyone and their mother will have a VM for a computer, it will just be the most economical way. At these scales, I mean: 128v cores or 256v cores NO ONE in the home will use anything close to 50% of what these CPU will be able to do. I mean yes,

you could use it for crypto mining.

Phishfry · Feb 17, 2021

I should mention the 7302p EPYC that I posted is middle of the road. $1000 chip not $4000 like their champ, the 64 core EPYC 7702

AMD EPYC 7702 Benchmark

Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.

GoNeFast_01 · Feb 17, 2021

Phishfry said:
Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.

Yup is sad.... Intel lost it, I switch to AMD...

Even GPU they're pushing NVIDIA for first time I am glad, literally have been a slave to nvidia... Actually still am due to NVIDIA support with Freebsd

.. But at least now I give AMD a look ?

Mjölnir · Feb 17, 2021

Sergei_Shablovsky said:
What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):

OT, you might already know this... but I hope you do not intend to put the (external) packet filter (often loosely called "firewall") onto the same physical machine than other services. Don't do that. It must be on it's own physical machine, solely for that purpose, and no other services on that host. In contrast, you can merge the internal PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the external one.

Sergei_Shablovsky · May 26, 2021

Mjölnir said:
In contrast, you can merge the internal PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the external one.

Thank You for informative reply.

Just from my second post in this thread:

Because the main question are: how network-focused software (I’m interesting exactly in
1. firewall pfSense solution
2. balancing HAproxy solution
3. FreeNAS storage solution)
working on systems with multi-CPU systems (which support multi-threading and have 4-6-8-12 cores).

This is complex question because each solution have different software architecture, and different loading strategy on CPU, memory and data bus.

That mean that we speak about one physical machine.

Of coarse for many reasons (sustainability, redundancy, point of failure, etc...) some functions better keep on separate machines: Firewall+router+DPI on one, balancer+ssl on another, etc...

In this thread I just try to receive the answer for “numbers of cpu, numbers of cores VS main frequency in FreeBSD for routing packets, analyzing packets, enc/decrypting packets and deals with RAID controllers to handle databases/VMs”

Sergei_Shablovsky · Jun 24, 2024

Phishfry said:
But where a dual CPU configuration can help is PCIe lanes. Typical Xeon had 40 Lanes. with 2 CPU that means 80 lanes.
For a setup requiring I/O this can be important.

Phishfry said:
There are benefits to single CPU. Interprocess communication kept on die is superior.

There are another very important practical side of this choice: financial!

I try to explain the key moments:
- price tag on 2 of “previous generation CPU” ALWAYS would bd LESS than price on 1 “new and hottest CPU” and ALWAYS give You much horsepower for comparable or less money;
- the same as previous according RAM: biggest volume for comparable price with not big drawbacks in speed. And prices for upgrade (because You need the same volume modules) also would be less;
- well manufactured rack mount servers from reputable brand (like IBM, Dell, Siemens/Fuji) would be ALWAYS MUCH STABLE in working;
- the same rack servers would be ALWAYS having much more lifetime;
- the QUALITY OF POWER in this rack servers ALWAYS would be better (even w/o Online-Interactive UPS). And in addition POWER source MODULES ARE DOUBLED (hotspot);

Am I wrong with this?

Sergei_Shablovsky · Jun 24, 2024

GoNeFast_01 said:
And that is why there are DUAL CORE EPYC Boards now.... Enjoying the boundaries being push let's go quantum!!!

Well I have installed FreeBSD on some 8-32 core processors performs relative well.... 4-8% with very heavy media use as desktop environment(100+ tabs of firefox, 40+ chrome, 20+ terminal, 2+ VM) sometimes ram becomes an issue but not CPU on my experience barely breaks 10% ever.

Surfing and SysAdmin?

Heh! Just install xCode, Lightroom, Logic, Motion, FinalCut with I normal bunch of plug-ins, or Premier, and You would be impressed by the CPU loading…

Sergei_Shablovsky · Jun 24, 2024

Phishfry said:
So is this a testing flaw? Passmark is a Windows thing so not representative of FreeBSD.
But I do feel that NUMA drags pretty hard.

Because of this I choose Phoronix test suite + some small synthetic utilites for direct help testing under DOS.

ralphbsz · Jun 24, 2024

Sergei_Shablovsky said:
- the same as previous according RAM: biggest volume for comparable price with not big drawbacks in speed. And prices for upgrade (because You need the same volume modules) also would be less;

Many workloads run MUCH better when there is a lot of RAM available. In particular those that have a wide file system working set, and benefit from file system caching. This is something that really needs benchmarking (with your real workload and your real file system / OS) to check.

Other workloads have a smaller working set (both of the program itself, and its file system cache usage), and then faster RAM is more important, since more RAM simply doesn't help.

The fastest single machine I've ever used at work had 1/2 TB of RAM (and several dozen PCI lanes); for the workload it was amazingly powerful (and I have no idea how much it cost, but I remember that each DIMM was $3K, and it had several dozen of them). My home server has 4 GB and a 4-core Atom CPU. The latter is optimized for low power consumption, small physical space, and reasonable cost.

- well manufactured rack mount servers from reputable brand (like IBM, Dell, Siemens/Fuji) would be ALWAYS MUCH STABLE in working;
- the same rack servers would be ALWAYS having much more lifetime;
- the QUALITY OF POWER in this rack servers ALWAYS would be better (even w/o Online-Interactive UPS). And in addition POWER source MODULES ARE DOUBLED (hotspot);

This is actually a very important observation. Amateur computers, built using desk side cases, cheap fans (often only 1 or 2 fans), and whatever power supply is on sale at NewEgg this week, tend to be somewhat unreliable. Enterprise-class servers may have less CPU power or slower RAM, but they may up for it by having about 10 or 12 fans (each individually pretty small), dual power supplies (which can be connected to two independent power sources, like one utility power + UPS, the other a generator-protected power source), and N+1 redundancy in fans and internal power distribution. Plus if you stay within a single brand (you buy all your expansion cards from the same vendor as the rack mount computer and motherboard), they tend to have very good BIOS support, for example for fan speed control. They are pretty much indestructible, exceedingly reliable (physical uptimes of a decade are common), and can be well managed (like integrating BIOS monitoring into an alarm infrastructure). The drawback is: bought new they are very expensive, and they tend to be very noisy.

Mandatory anecdote: When we got a set of new servers in the office, they had to be shipped without memory, because the server itself came from one color of money (capital investment), while the DIMMs could be bought from another color of money (as they were under $5K each, we bought them using the same pot of money used for office supplies like pencils). Which meant that someone had to volunteer to spend an afternoon in the computer room installing about a hundred DIMMs. Since I'm a fool, I did it. And one of the servers was installed at the top of the rack, so I had to use a ladder to install the DIMMs. Because getting the DIMMs up the ladder was painful, I unpacked them on the floor, then carried them up on the ladder a few at a time, without wearing an anti-static grounding strap. And managed to kill one of the DIMMs. Since we were an in-house customer, we had no warranty coverage, and my manager had to buy one extra DIMM for $3K, and he was REALLY MAD AT ME, since that looked bad on his budget. The lesson is: When dealing with expensive stuff, always wear anti-static wrist straps, connect them to the machine you're working on, and transport components in their original packaging until you're plugged in. Not sure that rule applies when using a $5 Raspberry Pi Zero though.

Sergei_Shablovsky · Jun 24, 2024

At the first I need to say BIG THANKS TO YOU about patience and so detailed answering!

I more than sure the so detailed conversations help others “BSD’s geeks” or SysAdmins over the world, really!

TOTALLY AGREE with all You wrote!

But with a few corrections:

ralphbsz said:
if you stay within a single brand (you buy all your expansion cards from the same vendor as the rack mount computer and motherboard), they tend to have very good BIOS support

Not so sure: I have experience personally and a lot of facts on users forums that this ‘good and well tested BIOS support” are just not more than “really good compatibility between parts in a limited set”.
Where quality of this set hardly belongs from marketing department’s means.
As e result even in good and reputable brands we have situation “some previous bugs + some new ones”.

But anyway they are only one choice if You need stability and amazing lifetime.

ralphbsz said:
The drawback is: bought new they are very expensive,

Not agree! Look at the IBM M3/M4 series on eBay: $150-250/each (with 2xPSU, not bad RAID, eventually good CPUs and amount of RAM)+ shipping.

Only fraction of managers really know HOW MUCH may serve this one M4 (or even M3) 10+ years old server!
And newcomers DevOps only training their ego and spending company’s budget by using bento “AWS + recipes from internet” to building a ton of VMs…
Mad new IT-world…!

ralphbsz said:
and they tend to be very noisy.

But we using it in DC, or separate server room.
Amateurs have a garage in s basement floor or up roof room…

P.S.
Anecdote make my day a little more shiny! Thank You!
Of coarse, serious equipment need serious mindful.

How FreeBSD utilize multicore processors and multi-CPU systems?

How pfSense utilize multicore processors and multi-CPU systems ?​

How pfSense utilize multicore processors and multi-CPU systems ?