How FreeBSD utilize multicore processors and multi-CPU systems?

Sergei_Shablovsky

Member

Reaction score: 2
Messages: 33

Hi, FreeBSD guru!
How FreeBSD utilize multicore processors in ONE CPU systems ?
How FreeBSD utilize multicore processors in multi-CPU systems ?
 

multix

Member

Reaction score: 2
Messages: 76

I have a Pentium-D which is Dual-Core, it will say:

FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)


The second CPU is launched:

SMP: AP CPU #1 Launched!


and everything looks smooth!
 

ralphbsz

Son of Beastie

Reaction score: 2,301
Messages: 3,212

Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.

This is from my home server:
Code:
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20171214/tbfadt-748)
ioapic0: Changing APIC ID to 4
ioapic0 <Version 2.0> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!

Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.
 
OP
Sergei_Shablovsky

Sergei_Shablovsky

Member

Reaction score: 2
Messages: 33

With MPS 1.4 (or newer).
Thank You for reply! ;)

Specification are great. In the same time we all know how implementation in real hardware (motherboard + cpu) AND software impact on.

Common place the last 10+ years that software development running faster (because of industry rushing, and common data value dramatically increasing) than hardware manufacturer able to Engineering and producing motherboards+CPU.

So, each time when we need a effective and well-balanced solution, we need to “calling to all”: community of users of certain software, community of users of certain operating system, and of course forums/support of hardware manufacturer. :)
 
OP
Sergei_Shablovsky

Sergei_Shablovsky

Member

Reaction score: 2
Messages: 33

Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.
Thank You for kindly reply!

The start topic is only first step. :)

Because the main question are: how network-focused software (I’m interesting exactly in
1. firewall pfSense solution
2. balancing HAproxy solution
3. FreeNAS storage solution)
working on systems with multi-CPU systems (which support multi-threading and have 4-6-8-12 cores).

This is complex question because each solution have different software architecture, and different loading strategy on CPU, memory and data bus.

What a You think about this?

Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.
Thank You so much! I’l try To find it.
 
OP
Sergei_Shablovsky

Sergei_Shablovsky

Member

Reaction score: 2
Messages: 33

Could You be so please to comment about manage iflib threads on several CPU cores, the last reply on this thread?

How pfSense utilize multicore processors and multi-CPU systems ?​

 
Last edited:
OP
Sergei_Shablovsky

Sergei_Shablovsky

Member

Reaction score: 2
Messages: 33

Hm. Looks like hard to find right answer...

I need a little bit to explain the topic start question:

What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
a) 1 CPU with 4-10 cores, hi-frequency
b) 2-4 CPU with 4-6 cores, mid-frequency

And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?
 

richardtoohey2

Aspiring Daemon

Reaction score: 301
Messages: 610

What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...)
I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?

Any also what you are doing with the traffic - pushing it through as fast as possible? Or trying to analyse it and do more than just pushing packets?

If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.
 
OP
Sergei_Shablovsky

Sergei_Shablovsky

Member

Reaction score: 2
Messages: 33

I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?
Because the different type of traffic involve different software chain react to. For example, media streaming packets, VPN sessions and ICMP are very different in processing inside software that are on the top of BSD. Am I wrong?

Any also what you are doing with the traffic - pushing it through as fast as possible? Or trying to analyse it and do more than just pushing packets?
From the networking device point of view the main goal are "processing packets w/o errors and as fast as possible".

My questions related more to situation when FreeBSD used as core of FW, VPN gate or balancer on usual Intel-based servers.
If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.

The speeds we must talking about are starting from 10-20Gb / s
 

richardtoohey2

Aspiring Daemon

Reaction score: 301
Messages: 610

Not sure you will get answers to those sorts of questions on these forums.

There's a book:

Design and Implementation of the FreeBSD Operating System, The 2nd Edition

And Netflix work on FreeBSD and networking e.g.


Using FreeBSD and commodity parts, we achieve 90 Gb/s serving TLS-encrypted connections with ~55% CPU on a 16-core 2.6-GHz CPU.

Pretty sure there are other Netflix papers on working with FreeBSD and NUMA etc.

Have a look at https://papers.freebsd.org/ e.g.


I'm not sure if you are just trying to learn how things work or if you have a specific requirement or issue that you need to fix - maybe if you are more specific then someone can help.
 

Phishfry

Beastie's Twin

Reaction score: 2,650
Messages: 5,566

I know this is a subjective topic but I prefer Single Socket server board.
The second cpu does not bring a linear acceleration. There is a preformance hit for dual cpu.
Witness the synthetic benchmarks.
Single CPU = 11K
Same CPU dual =18K

But where a dual CPU configuration can help is PCIe lanes. Typical Xeon had 40 Lanes. with 2 CPU that means 80 lanes.
For a setup requiring I/O this can be important. The newer LGA3647 Xeon has 48 Lanes.
AMD EPYC has 128 Lanes.

So the single EPYC/2 will smash most Dual CPU setups.

There are benefits to single CPU. Interprocess communication kept on die is superior.
 

Watitsthis

New Member

Reaction score: 13
Messages: 18

And that is why there are DUAL CORE EPYC Boards now.... ;) Enjoying the boundaries being push let's go quantum!!!


Well I have installed FreeBSD on some 8-32 core processors performs relative well.... 4-8% with very heavy media use as desktop environment(100+ tabs of firefox, 40+ chrome, 20+ terminal, 2+ VM) sometimes ram becomes an issue but not CPU on my experience barely breaks 10% ever. :rolleyes:
 

Phishfry

Beastie's Twin

Reaction score: 2,650
Messages: 5,566

Yea but did you notice the benchmarks?
Single EPYC=33K
Dual EPYC=40K
Yikes imagine buying 2 chips at $4K each and only getting marginal increase....

So is this a testing flaw? Passmark is a Windows thing so not representative of FreeBSD.
But I do feel that NUMA drags pretty hard.

Intel uses QPI for its core interconnect and it is quick. Going off die is costly.
 

Watitsthis

New Member

Reaction score: 13
Messages: 18

Yea but did you notice the benchmarks?
Single EPYC=33K
Dual EPYC=40K
Yikes imagine buying 2 chips at $4K each and only getting marginal increase....
This is a nice catch, and if what the benchmark say is true, I would be MAD as hell :rude:

Honestly though, I think that is a software limitation or bottleneck (Maybe WINDOWS? like you mention )... At the end of the day we need to put them in production environment and let these thing bleed. I like to test in real world environment, call me a benchmark skeptic.

I like the VCORE for cloud's future, the cost is being driven to the ground... I mean pretty soon everyone and their mother will have a VM for a computer, it will just be the most economical way. At these scales, I mean: 128v cores or 256v cores NO ONE in the home will use anything close to 50% of what these CPU will be able to do. I mean yes, :-/ you could use it for crypto mining.
 

Phishfry

Beastie's Twin

Reaction score: 2,650
Messages: 5,566

I should mention the 7302p EPYC that I posted is middle of the road. $1000 chip not $4000 like their champ, the 64 core EPYC 7702

Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.
 

Watitsthis

New Member

Reaction score: 13
Messages: 18

Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.
Yup is sad.... Intel lost it, I switch to AMD...

Even GPU they're pushing NVIDIA for first time I am glad, literally have been a slave to nvidia... Actually still am due to NVIDIA support with Freebsd :rolleyes:.. But at least now I give AMD a look 😅
 

Mjölnir

Daemon

Reaction score: 1,502
Messages: 2,114

What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):
OT, you might already know this... but I hope you do not intend to put the (external) packet filter (often loosely called "firewall") onto the same physical machine than other services. Don't do that. It must be on it's own physical machine, solely for that purpose, and no other services on that host. In contrast, you can merge the internal PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the external one.
 
OP
Sergei_Shablovsky

Sergei_Shablovsky

Member

Reaction score: 2
Messages: 33

In contrast, you can merge the internal PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the external one.
Thank You for informative reply.

Just from my second post in this thread:

Because the main question are: how network-focused software (I’m interesting exactly in
1. firewall pfSense solution
2. balancing HAproxy solution
3. FreeNAS storage solution)
working on systems with multi-CPU systems (which support multi-threading and have 4-6-8-12 cores).

This is complex question because each solution have different software architecture, and different loading strategy on CPU, memory and data bus.

That mean that we speak about one physical machine.

Of coarse for many reasons (sustainability, redundancy, point of failure, etc...) some functions better keep on separate machines: Firewall+router+DPI on one, balancer+ssl on another, etc...

In this thread I just try to receive the answer for “numbers of cpu, numbers of cores VS main frequency in FreeBSD for routing packets, analyzing packets, enc/decrypting packets and deals with RAID controllers to handle databases/VMs”
 
Top