powerd: cannot set adaptive threshold above 100%?

Greetings, I have a 48-core system (dual Xeon E5-2680 V3), that is mostly idle, and I'd like to have clocks lowered accordingly, with power'd adaptive mode.

When I attempt so, the frequency stays pegged at high, because due to the number of cores the utilization is higher than 75% default threshold:

Code:
root# /usr/sbin/powerd -N -b adaptive -a adaptive -n adaptive -v 
powerd: unable to determine AC line status
load 206%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 250%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 159%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 415%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 251%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 182%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 243%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 262%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 259%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 187%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 170%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 201%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 136%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 212%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 259%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 207%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 253%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 129%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 219%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 239%, current freq 2501 MHz ( 0), wanted freq 2501 MHz
load 178%, current freq 2501 MHz ( 0), wanted freq 2501 MHz

When I try to configure adaptive
Code:
-r
parameter to a reasonable value -- e.g. 300% -- it is not accepted:

Code:
root# /usr/sbin/powerd -N -b adaptive -a adaptive -n adaptive -v -r 300
powerd: 300 is not a valid percent

I feel like I'm missing something very obvious here, or it this a bug?

13.1-RELEASE-p9
 


Thank you,I'll try to play with it, but from the man pages it seems it's even more restrictive -- the powers's `-r` flag is not supported, and the target values are fixed, and all under 100%:

adaptive, adp
A target load of 0.5 (50%).
hiadaptive, hadp
A target load of 0.375 (37.5%).

I'll test it, on the off chance that they do refer to the total system load -- but I highly doubt; it sources data from kern.cp_times, just like powerd does, and nowhere it says it integrates across cores; and it claims to be drop-in replacement for powerd, and hence likely exhibit same behaviour.
 
the powers's `-r` flag is not supported
True, that specific flag as such is not present in powerdxx. However, powerdxx works differently and, in my view is not more limiting in its abilities than powerd; probably rather the opposite.

You might expect that the -r flag behaves differently than what's actually implemented. The way I read the man page is, that when the CPU is under a certain load of x% or higher, powerd should try to make the CPU work harder, e.g. increase its freq. The CPU freq. actually attained could be above 100%; I think this would be called turbo mode. Perhaps my view on this is wrong, but the code of powerd makes clear that what you're requesting (300%) is out of bounds:
powerd.c:
Code:
	case 'r':
			cpu_running_mark = atoi(optarg);
			if (cpu_running_mark <= 0 || cpu_running_mark > 100) {
				warnx("%d is not a valid percent",
				    cpu_running_mark);
				usage();
			}
the same type of limitations holds for -i flag. I'm not sure what behaviour you'd hoped for or expected.

[...] and nowhere it says it integrates across cores;
As an important part of powerdxx in general and also relevant to your referenced areas:
-- Load Control Loop
The original powerd(8) uses a hysteresis to control the CPU frequency. I.e. it determines the load over all cores since taking the last sample (the summary load during the last polling interval) and uses a lower and an upper load boundary to decide whether it should update the frequency or not.

powerd++ has some core differences. It can take more than two samples (four by default), this makes it more robust against small spikes in load, while retaining much of its ability to quickly react to sudden surges in load. Changing the number of samples does not change the
runtime cost of running powerd++.

Instead of taking the sum of all loads, the highest load within the core group is used to decide the next frequency target. Like with powerd(8) this means, that high load on a single core will cause an increase in the clock frequency. Unlike powerd(8) it also means that moderate load over all cores allows a decrease of the clock frequency.

The powerd++ daemon steers the clock frequency to match a load target, e.g. if there was a 25% load at 2 GHz and the load target was 50%, the frequency would be set to 1 GHz.

Please refer to the extensive reference manual* (245 pages, mentioned here) for further documentation of powerdxx's abilities and options. It may, in the end, not do everything you would like it to do, but you'll have at least a more complete assessment of powerdxx.

__
* I'm not aware of any such elaborate documentation of powerd.
 
Unlike powerd(8) it also means that moderate load over all cores allows a decrease of the clock frequency.
Very cool! Thank you for the link to the documentation, I'll read that, and experiment with it. It does look promising.


the same type of limitations holds for -i flag. I'm not sure what behaviour you'd hoped for or expected.
I expected the percentage specified there to have the same scale as the load reported by powerd itself: system CPU usage normalized to one core. (I.e. 100% is one core is fully saturated, 5000% means 50 cores are fully saturated).

Powerd appears to report system load following this convention: see lines like "load 206%" in my original message; so it's reasonable to assume that -r parameter shall also either expect total system (that can go above 100%).

Since it appears that it validates input to be within 100%, it must mean it accepts per-CPU load.

And yet, it seems to be treating that as total load anyway: when my 48 CPUs are loaded at 5%, and the threshold is set to 75%, powerd runs clocks at max -- because 5%*48 cores = 240% > 75%.

This looks like a bug, either in validation, or implementation. I'll need to look into source to confirm.
 
You also will need to consider the following: one core is running full burn and the rest is idle. This may well be considered a load of 7% or so in these high core count machines, making powerd run down the clock rate to minimum.
 
Meaning, there is need for a system load and a core load config option. Feature request, hint hint.
 
Very cool! Thank you for the link to the documentation, I'll read that, and experiment with it. It does look promising.

You just need to read powerdxx(8) a lot more thoroughly, powerd(8) also.

You're extrapolating from false conclusions about how these work.

I expected the percentage specified there to have the same scale as the load reported by powerd itself: system CPU usage normalized to one core.

No, it's over all cores, just as the manpages and the code - easily checked in powerd's case - show.

(I.e. 100% is one core is fully saturated, 5000% means 50 cores are fully saturated).

Incorrect. 100% is all cores at max. frequency.

Powerd appears to report system load following this convention: see lines like "load 206%" in my original message; so it's reasonable to assume that -r parameter shall also either expect total system (that can go above 100%).

Avoid assumptions, especially when based on earlier incorrect assumptions.

Since it appears that it validates input to be within 100%, it must mean it accepts per-CPU load.

Non sequitur.

And yet, it seems to be treating that as total load anyway: when my 48 CPUs are loaded at 5%, and the threshold is set to 75%, powerd runs clocks at max -- because 5%*48 cores = 240% > 75%.

GIGO. Stop digging.

This looks like a bug, either in validation, or implementation. I'll need to look into source to confirm.

It's an interpretation bug. Reading source should help if you follow logic without assumption.

This code has been very stable for two decades in powerd's case.

Do you know if your xeon even supports speedstep (cpufreq), or speedshift (hwpstate_intel), or ...?
 
Back
Top