heterogeneous CPU-cores control and managment

Im learning about managing the performance and efficiency cores in Intels later cpu-models.

the Arrowlake Ultra 9 285K cpu have 8 performance and 16 efficiency cores.

Performance cores are: 0, 1, 10-13, 22, 23 .
Efficiency cores are: 2-9 and 14-21 .

This dispersion of cores seems to be to manage heat and cooling of the cores.

Its possible to start the Xwindows KDE6/Plasma6 desktop with the standard .xinitrc and the command cpuset
$ cat .xinitrc
exec dbus-launch --exit-with-x11 ck-launch-session startplasma-x11.

$ cpuset -l 0,1,10-13,22,23 -s 1 startx

The when I start a browser :
ps -a | grep firefox
62415 v0 S 0:03.57 /usr/local/bin/firefox
63652 v0 I 0:00.05 /usr/local/lib/firefox/firefox -contentproc -initialChan

The browser executes in the performance cores.

cpuset -g -p 62415
pid 62415 mask: 0, 1, 10, 11, 12, 13, 22, 23

I can start a compilation run in the Efficiency cores:

# cd /usr/ports/editors/calligra
# cpuset -l 2-9,14-21 make install

this rumbles for a while creating this BTOP img:


btop_cpuset.png



So it works and a specific task can be contained to a subset of cores.

NOW: the core allocation into CPUSETS seem to be very transient and require manual commands each time.
when the Xwindow sessione and sample compile run have completed and the user have logged out of the Xwindows session, the CPUSET's allocations have vanished.

- can the OS "remember" the CPUSETS somehow , so that they dont need to be cpuset befor every command ?

//Regards
 
- can the OS "remember" the CPUSETS somehow , so that they dont need to be cpuset befor every command ?
The shell may be a place for experimenting - define your cpusets and a mapping from command names (or whatever criteria) to cpusets, preferably in some config file which can be updated from the shell itself. The shell then execs the cpuset program giving it the command to execute.
 
The issue is having a scheduler intelligent enough to dynamically disperse workloads to each core type that will maintain power efficiency and performance.

In macOS, Safari uses both core types depending on the workload being called from the user. (for ex. complex javascript, 3D rendering on P cores, and background tabs/static sites on E cores.)

The scheduler problem has yet to be solved by the committers AFAIK (or from the Linux community for that matter). It'll be interesting to see what their approach is considering traditional Unix scheduling is historically client/server workload oriented. Can they address multiple demographics?
 
How did you arrive at those odd numbers for P-cores?

The way to put this into a script is:
Code:
#! /bin/sh
cpuset -l 0,1,10-13,22,23 -s 1 "$@"

Last I checked Intel P-cores were not getting the correct speed under FreeBSD. When you run a benchmark (such as a compile), what speed do you get out of your P-cores and your E-cores?
 
The issue is having a scheduler intelligent enough to dynamically disperse workloads to each core type that will maintain power efficiency and performance.

In macOS, Safari uses both core types depending on the workload being called from the user. (for ex. complex javascript, 3D rendering on P cores, and background tabs/static sites on E cores.)

The scheduler problem has yet to be solved by the committers AFAIK (or from the Linux community for that matter). It'll be interesting to see what their approach is considering traditional Unix scheduling is historically client/server workload oriented. Can they address multiple demographics?

Very little is known about how Apple schedules P and E cores, and not much more for Windows 11. Win11 interacts with Intel's thread director, whatever that thing is doing exactly. As far as I can tell it reports about a thread's use of "special" insturctions that indicate heavy lifting.

The Win11 scheduler seems to be a hardcoded mess out of name-matching known applications which then have information about whether that application should use P or E cores, plus the performance profile in effect.

I haven't looked at what Linux is doing there.
 
How did you arrive at those odd numbers for P-cores?

The way to put this into a script is:
Code:
#! /bin/sh
cpuset -l 0,1,10-13,22,23 -s 1 "$@"

Last I checked Intel P-cores were not getting the correct speed under FreeBSD. When you run a benchmark (such as a compile), what speed do you get out of your P-cores and your E-cores?
Intel data on this CPU is that it contain 8 performance and 16 efficiency Cores.
I ran a Verbose boot on the machine which includes a list of the so called "packages" of cores that exits .
so verbose dmesg -a showed 4 packages with 4 cores each and 8 packages with 1 core each.

The single core packages ran at higher speed than the Quad-core packages,
so I assumed that the four Quad-core packages are the 16 Efficiency Cores.

Compiling on the E-cores achieves a Frequency_MAX of approx 4580 Mhz ( intel stated Max in datasheet is 4600 Mhz )
Compiling o the P-Cores push Frequency-Max to almost 5300 Mhz ( intel stated Max in datasheet is 5500 Mhz )

so not quiet as advertised for the P-cores but fairly close.

The Max Turbo Frequency of 5700 MHz stated in the datasheet has not been measurable
but I assume it does that for sub-second time intervals.

EDIT: this using

performance_cx_lowest="LOW"
performance_cpu_freq="HIGH"

in rc.conf and not using "powerd"
 
Back
Top