PDA

View Full Version : HOWTO: FreeBSD CPU Scaling and Power Saving


vermaden
November 16th, 2008, 22:47
For those who do not know FreeBSD is able to scale CPU speed (both desktop and mobile onesm thy just nned to support it and have enabled it in BIOS).

To enable that feature you need to add this line to /etc/rc.conf:
powerd_enable="YES"

You can also tweak how much you CPU will scale depends on the load, for example:
powerd_flags="-i 85 -r 60 -p 100"

powerd by default use adaptive mode (thanks to BSDKaffee (http://daemonforums.org/showthread.php?t=2382#post17773))

You can also tweak lowest CPU frequency used by CPU by setting this in /etc/sysctl.conf or /boot/loader.conf:
debug.cpufreq.lowest=600

You can also set it by hand in terminal using sysctl:
sysctl debug.cpufreq.lowest=600

Up to yesterday there was no option to set highest value to limit max CPU speed to save power or limit overheat, but Boris Kochergin wrote a patch (http://acm.poly.edu/~spawk/cpufreq) to support also the highest limit with debug.cpufreq.highest oid:
sysctl debug.cpufreq.highest=1200

These patches are for 7.0-RELEASE and 7-STABLE (I did not checked 8-CURRENT but propably also works):

/usr/src/sys/kern/kern_cpu.c (driver):
--- kern_cpu.c.orig 2008-11-08 13:12:24.000000000 -0500
+++ kern_cpu.c 2008-11-08 10:33:18.000000000 -0500
@@ -131,12 +131,16 @@
DRIVER_MODULE(cpufreq, cpu, cpufreq_driver, cpufreq_dc, 0, 0);

static int cf_lowest_freq;
+static int cf_highest_freq;
static int cf_verbose;
TUNABLE_INT("debug.cpufreq.lowest", &cf_lowest_freq);
+TUNABLE_INT("debug.cpufreq.highest", &cf_highest_freq);
TUNABLE_INT("debug.cpufreq.verbose", &cf_verbose);
SYSCTL_NODE(_debug, OID_AUTO, cpufreq, CTLFLAG_RD, NULL, "cpufreq debugging");
SYSCTL_INT(_debug_cpufreq, OID_AUTO, lowest, CTLFLAG_RW, &cf_lowest_freq, 1,
"Don't provide levels below this frequency.");
+SYSCTL_INT(_debug_cpufreq, OID_AUTO, highest, CTLFLAG_RW, &cf_highest_freq, 1,
+ "Don't provide levels above this frequency.");
SYSCTL_INT(_debug_cpufreq, OID_AUTO, verbose, CTLFLAG_RW, &cf_verbose, 1,
"Print verbose debugging messages");

@@ -295,6 +299,14 @@
goto out;
}

+ /* Reject levels that are above our specified threshold. */
+ if (cf_highest_freq > 0 && level->total_set.freq > cf_highest_freq) {
+ CF_DEBUG("rejecting freq %d, greater than %d limit\n",
+ level->total_set.freq, cf_highest_freq);
+ error = EINVAL;
+ goto out;
+ }
+
/* If already at this level, just return. */
if (CPUFREQ_CMP(sc->curr_level.total_set.freq, level->total_set.freq)) {
CF_DEBUG("skipping freq %d, same as current level %d\n",
@@ -617,8 +629,13 @@
continue;
}

- /* Skip levels that have a frequency that is too low. */
- if (lev->total_set.freq < cf_lowest_freq) {
+ /*
+ * Skip levels that have a frequency that is too low or too
+ * high.
+ */
+ if (lev->total_set.freq < cf_lowest_freq ||
+ (cf_highest_freq > 0 &&
+ lev->total_set.freq > cf_highest_freq)) {
sc->all_count--;
continue;
}

/usr/src/share/man/man4/cpufreq.4 (man page):
--- cpufreq.4.orig 2008-11-08 13:08:19.000000000 -0500
+++ cpufreq.4 2008-11-08 13:08:51.000000000 -0500
@@ -98,6 +98,11 @@
This setting is also accessible via a tunable with the same name.
This can be used to disable very low levels that may be unusable on
some systems.
+.It Va debug.cpufreq.highest
+Highest CPU frequency in MHz to offer to users.
+This setting is also accessible via a tunable with the same name.
+This can be used to disable very high levels that may be unusable on
+some systems.
.It Va debug.cpufreq.verbose
Print verbose messages.
This setting is also accessible via a tunable with the same name.

Apply them like that:
# cd /usr/src/share/man/man4
# patch < /path/to/cpufreq.4.patch
#
# cd /usr/src/sys/kern
# patch < /path/to/kern_cpu.c

Then rebuild kernel and reboot to use it.

This /usr/src/share/man/man4/cpufreq.4 is just a manpage so its not mandatory to apply/rebuid it.

Abialable CPU frequencies are aviable via dev.cpu.0.freq_levels oid, example:
# sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1200/13000 1050/11375 900/9750 750/8125 600/6500

You can also set Cx sleep state for your CPUs with dev.cpu.1.cx_lowest and dev.cpu.0.cx_lowest and so per CPU.

You can change them that:
# sysctl dev.cpu.0.cx_lowest=C3
dev.cpu.1.cx_lowest: C1 -> C3

WARN: Dunno for other laptops but when I use lowest C3 step (or deeper like C4, C5, ...) for all cores, then I have little lag when I use my touchpad, this can be easily eliminated when you set one of the CPUs to C2 and all other to C3 to save power, no lag with that settings.

List of supported states are avialable via these oids:
dev.cpu.0.cx_supported: C1/1 C2/1 C3/57
dev.cpu.1.cx_supported: C1/1 C2/1 C3/57

Suggested setting (one with C2 state, other as deep as possible) in /etc/sysctl.conf:
dev.cpu.0.cx_lowest=C3
dev.cpu.1.cx_lowest=C2

You can read more about Intel C power states here:
http://software.intel.com/en-us/blogs/2008/03/27/update-c-states-c-states-and-even-more-c-states/
http://www.techarp.com/showarticle.aspx?artno=420&pgno=6

I measured power consumption of my CPU which is Intel T7300 (http://processorfinder.intel.com/details.aspx?sSpec=SLA45) (in my Dell D630) under full load*[1], by a small device called wattmeter, it is connected like that:

power (in the wall) <--> wattmeter <--> laptop (without batteries)

Here are the results:
MHz system power consumption (whole laptop)
150 22W
300 22W
450 23W
600 23W
750 24W
900 25W
1050 26W
1200 27W
1400 33W
1750 42W
2000 47W

1200MHz seems to have best power/performance ratio and that is what I personally use.

[1] 999999999999999999999999999 ** 999999999999999999999999999; launched 4 times (to full load two cores) in python.

... and by the way, setting kern.hz=100 in /boot/loader.conf will also make your battery life little longer.

WARN: If these options differ for AMD CPUs, then let me know, or just post them in this thread.

If you have any questions or I forgot about something then let me know ;)

manolis@
November 16th, 2008, 23:27
Thanks, this is really useful :)
You may wish to modify powerd_flags to this:

powerd_flags="-a maximum -b adaptive -i 85 -r 60 -p 100"

if you are using it on a laptop (esp. a low end one, like my aspire one) so you will get maximum performance when plugged in.

I am still playing with these values myself, to get the best possible responsiveness and save on battery too. I am not there yet, but I guess I'll tweak the minimum CPU freq. to 700Mhz and will be close.

vermaden
November 16th, 2008, 23:34
Thanks, this is really useful :)

You are welcome ;)

You may wish to modify powerd_flags to this (...)
if you are using it on a laptop (esp. a low end one, like my aspire one) so you will get maximum performance when plugged in.

Good point, I do not have any experience with small netbooks.

All these calculations are made on Dell D630 laptop (I mentioned T7300 CPU).

overmind
November 20th, 2008, 14:10
Hello, Manolis@,

I see you are using a Acer Aspire One. I just bought one recently and I am trying to configure it, do you have any hints?

I've opened a thread regarding Acer Aspire One, here: http://forums.freebsd.org/showthread.php?t=382

If you have some tips, please share it with us.
(network card, wifi, card reader, webcam, power management, optimization tips)

xwwu
November 30th, 2008, 18:23
Dear Vermaden:

First question:

Is
powerd_flags="-i 85 -r 60 -p 100"
in /etc/rc.conf also?

Second:

How can I set:

# sysctl dev.cpu.0.cx_lowest=C3
dev.cpu.1.cx_lowest: C1 -> C3

permanently in system instead of type them in terminal?

Thanks!

danger@
November 30th, 2008, 20:42
permanently in system instead of type them in terminal?

add it to /etc/sysctl.conf

richardpl
December 1st, 2008, 01:30
Right way to set cput cx states: C{1,2,3,4 ..} is via rc.conf:


performance_cx_lowest="HIGH" # Online CPU idle state
performance_cpu_freq="NONE" # Online CPU frequency
economy_cx_lowest="HIGH" # Offline CPU idle state
economy_cpu_freq="NONE" # Offline CPU frequency


In this way they are used with devd(8).
Read /etc/rc.d/power_profile for explanation.

vermaden
December 1st, 2008, 08:53
Right way to set cput cx states: C{1,2,3,4 ..} is via rc.conf:


performance_cx_lowest="HIGH" # Online CPU idle state
performance_cpu_freq="NONE" # Online CPU frequency
economy_cx_lowest="HIGH" # Offline CPU idle state
economy_cpu_freq="NONE" # Offline CPU frequency


In this way they are used with devd(8).
Read /etc/rc.d/power_profile for explanation.

But does it allow setting different C states per CPU core?

richardpl
December 1st, 2008, 13:38
No, but that one is not hard to fix.
Problem with setting it via sysctl.conf is that some ACPI allow C3 and lower states only when laptop is not on AC.
So once laptop is disconnected from AC CPU will be put in lower power state. Also it is not usefull to have same sysctl settings when latop is on AC and when it is on batteries.
And it is very ugly to modify cx states manually.

vermaden
December 1st, 2008, 14:20
No, but that one is not hard to fix.

So its little useless cause setting both cores to C3 creates a big delay in touchpad getting to react, while setting one core to C2 and the other one to C3 solves taht roblem.

Problem with setting it via sysctl.conf is that some ACPI allow C3 and lower states only when laptop is not on AC.

So what will happen then? It will be put into higher C state like C0 and when you remove power cord it will back to C3 for example?

[QUOTE=richardpl;3869]So once laptop is disconnected from AC CPU will be put in lower power state.
That is the purpose of C states, to save energy while you remove power cord, you wanted to "tell" your CPU manually to switch to lower power states again?

Also it is not usefull to have same sysctl settings when latop is on AC and when it is on batteries.

Why? It will just use less power or sleep down the cores while they are NOT usefull, while there will be demand for horsepower the CPU will be in highest C0 state so whats the problem?

Also when ypu just want to charge up batteries you still want CPU to scale since laptop will get much warmer without powersave options.

And it is very ugly to modify cx states manually.
So develop better interface, sysctls are designed to use them, no to hide them from usage, also its done ONCE, later its just loaded at boot.

I would also want that FreeBSD would self detect best possible settings for my current laptop model, but we both know that it is impossible, so we have to set these best settings manually unfortunelly.

richardpl
December 1st, 2008, 23:47
So its little useless cause setting both cores to C3 creates a big delay in touchpad getting to react, while setting one core to C2 and the other one to C3 solves taht roblem.
It is not useless for machines with only one cpu/core (enabled).

So what will happen then? It will be put into higher C state like C0 and when you remove power cord it will back to C3 for example?
Unfortunately not, sysctl reports invalid argument and quits. But it is BIOS "fault" to not allow C3 and lower while on AC.


#!/bin/sh
#
# Modify the power profile based on AC line state. This script is
# usually called from devd(8).

devd.conf(5) is "right" API to do that - not manual typing and/or sysctl.conf (which is checked almost always only once)

trev
January 5th, 2009, 12:03
It seems a little unbelievable that the AMD Phenom is not yet officially supported by cpufreq on FreeBSD 7.X-RELEASE, but help is at hand.

[Note: read Edits at end of post for updates]

How to install and more:
0. shell> /etc/rc.d/powerd stop
1. detach the hwpstate.c file attached to this post
2. shell> cp hwpstate.c /usr/src/sys/i386/cpufreq/
3. edit /usr/src/sys/modules/cpufreq/Makefile and change,
-SRCS+= est.c p4tcc.c powernow.c
+SRCS+= est.c p4tcc.c powernow.c hwpstate.c
4. delete the line "device cpufreq" from your KERNCONF file if
present and make kernel without cpufreq.
5. shell> cd /usr/src/sys/modules/cpufreq/ && make && make install
6. "umount -a" or "mount -u -o ro /somewhere" as possible kernel panic, and sync;sync;sync (if you're paranoid)
7. shell> kldload cpufreq
8. dmesg should show the verbose message "hwpstate0: <Cool`n'Quiet 2.0> on cpu0".
9. shell> sysctl dev.cpu.0.freq_levels
10. shell> sysctl dev.cpu.0.freq=XXXX
11. shell> /etc/rc.d/powerd start

(Courtesy of G. Otsuji anonna2 at gmail dot com)

And a script I wrote to reduce typing :)

#!/bin/sh
speed=`sysctl dev.cpu.0.freq | cut -f2 -d":"`
possibles=`sysctl dev.cpu.0.freq_levels | cut -f2 -d":" | sed "s/\/[-0-9]*/MHz/g"`

echo ""
echo "Speed: ${speed}MHz from${possibles}"
echo ""

shell>speed

Speed: 1100MHz from 2200MHz 1100MHz

[above display for my Phenom 9550]

================================================== ===============

Edit 2009/02/21: Patch supplied by author against the PR version located at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/128575

Edit 2009/05/30: See new "closed" PR version located at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/128575 - Only incorporated in CURRENT (8.0) but also works for 7.X if you comment out hwpstate.c lines 307 and 308 being:


if (cpu_vendor_id != CPU_VENDOR_AMD || CPU_FAMILY(cpu_id) < 0x10)
return;


as CPU_VENDOR_AMD is not defined in 7.X. This is safe enough because you _do_ have an AMD 10h (Phenom/Opteron Quad) or 11h (Phenom II?) family CPU :)

I have removed the original attachments/code as a result.

Edit: 2010/01/22 The hwpstate.c file referenced above now causes the cpufreq not to load on FreeBSD 7.2-STABLE. However, if you grab the latest hwpstate.c file from http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/src/sys/i386/cpufreq/hwpstate.c?rev=1.5.2.2;content-type=text%2Fplain life returns to normal again :)

vermaden
January 5th, 2009, 13:28
Thanks for sharing it mate.

Its also unexplainable that 7.1 RELEASE does not support that out of the box, these CPUs are around for more then a year ;/

BTW: You can simplify it this way:
-speed=`sysctl dev.cpu.0.freq | cut -f2 -d":"`
+speed=$( sysctl -n dev.cpu.0.freq )

also hwpstate.c for those who are NOT logged in:

/*-
* Copyright (c) 2008 Gen Otsuji
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted providing that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*/

/*
* very much thanks to Veronica(fluffles.net)
*/

/*
* Reference:
* Rev 3.06 March 26, 2008 - BIOS and Kernel Developer's Guide(BKDG)
* for AMD Family 10h Processors
*/

#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");

#include <sys/param.h>
#include <sys/bus.h>
#include <sys/cpu.h>
#include <sys/kernel.h>
#include <sys/module.h>
#include <sys/proc.h>
#include <dev/pci/pcivar.h>
#include <machine/md_var.h>

#include <contrib/dev/acpica/acpi.h>
#include <dev/acpica/acpivar.h>

#include "acpi_if.h"
#include "cpufreq_if.h"

#define MSR_AMD10H_LIMIT 0xc0010061
#define MSR_AMD10H_CONTROL 0xc0010062
#define MSR_AMD10H_STATUS 0xc0010063
#define MSR_AMD10H_CONFIG 0xc0010064
#define AMD10H_PVI_MODE 1
#define AMD10H_SVI_MODE 0
#define AMD10H_MAX_STATES 16

/* for MSR_AMD10H_LIMIT C001_0061 */
#define AMD10H_GET_PSTATE_MAX_VAL(msr) (((msr) >> 4) & 0xF)
/* for MSR_AMD10H_CONFIG C001_0064:68 */
#define AMD10H_CUR_VID(msr) (((msr) >> 9) & 0x3F)
#define AMD10H_CUR_DID(msr) (((msr) >> 6) & 0x07)
#define AMD10H_CUR_FID(msr) ((msr) & 0x3F)

/*
* setting this to 0 can hush up verbose messages.
*/
static int hwpstate_verbose = 1;

struct hwpstate_setting {
int freq; /* CPU clock in Mhz or 100ths of a percent. */
int volts; /* Voltage in mV. */
int power; /* Power consumed in mW. */
int lat; /* Transition latency in us. */
int pstate_id;
device_t dev; /* Driver providing this setting. */
};

struct hwpstate_softc {
device_t dev;
struct hwpstate_setting hwpstate_settings[AMD10H_MAX_STATES];
int cfnum;
int voltage_mode; /* for AMD10H_PVI_MODE / AMD10H_SVI_MODE */
int curpstate;
};

static void hwpstate_identify(driver_t * driver, device_t parent);
static int hwpstate_probe(device_t dev);
static int hwpstate_attach(device_t dev);
static int hwpstate_detach(device_t dev);
static int hwpstate_set(device_t dev, const struct cf_setting *cf);
static int hwpstate_get(device_t dev, struct cf_setting *cf);
static int hwpstate_settings(device_t dev, struct cf_setting *sets, int *count);
static int hwpstate_type(device_t dev, int *type);
static int hwpstate_shutdown(device_t dev);
static int hwpstate_features(driver_t * driver, u_int * features);

static device_method_t hwpstate_methods[] = {
/* Device interface */
DEVMETHOD(device_identify, hwpstate_identify),
DEVMETHOD(device_probe, hwpstate_probe),
DEVMETHOD(device_attach, hwpstate_attach),
DEVMETHOD(device_detach, hwpstate_detach),
DEVMETHOD(device_shutdown, hwpstate_shutdown),

/* cpufreq interface */
DEVMETHOD(cpufreq_drv_set, hwpstate_set),
DEVMETHOD(cpufreq_drv_get, hwpstate_get),
DEVMETHOD(cpufreq_drv_settings, hwpstate_settings),
DEVMETHOD(cpufreq_drv_type, hwpstate_type),

/* ACPI interface */
DEVMETHOD(acpi_get_features, hwpstate_features),

{0, 0}
};

static devclass_t hwpstate_devclass;
static driver_t hwpstate_driver = {
"hwpstate",
hwpstate_methods,
sizeof(struct hwpstate_softc),
};
DRIVER_MODULE(hwpstate, cpu, hwpstate_driver, hwpstate_devclass, 0, 0);

static void
hwpstate_goto_pstate(device_t dev,int pstate)
{
struct hwpstate_softc *sc;
uint64_t msr;
int i;
sc = device_get_softc(dev);
sc->curpstate = pstate;
wrmsr(MSR_AMD10H_CONTROL, pstate);
for(i=0;i<100;i++){
msr=rdmsr(MSR_AMD10H_STATUS);
if(msr==pstate){
break;
}
DELAY(100);
}
msr=rdmsr(MSR_AMD10H_STATUS);
if(hwpstate_verbose)
device_printf(dev,"Now P%d-state.\n",(int)msr);
return;
}

static int
hwpstate_set(device_t dev, const struct cf_setting *cf)
{
struct hwpstate_softc *sc;
struct hwpstate_setting *set;
int i;
if (cf == NULL)
return (EINVAL);
sc = device_get_softc(dev);
set = sc->hwpstate_settings;
for (i = 0; i < sc->cfnum; i++)
if (cf->freq == set[i].freq)
break;
if (i == sc->cfnum)
return EINVAL;
if(hwpstate_verbose)
device_printf(dev,"goto P%d-state\n",set[i].pstate_id);
sc->curpstate = set[i].pstate_id;
hwpstate_goto_pstate(dev,set[i].pstate_id);
return (0);
}

static int
hwpstate_get(device_t dev, struct cf_setting *cf)
{
struct hwpstate_softc *sc;
struct hwpstate_setting set;
sc = device_get_softc(dev);
if (cf == NULL)
return (EINVAL);
set = sc->hwpstate_settings[sc->curpstate];
cf->freq = set.freq;
cf->volts = set.volts;
cf->power = CPUFREQ_VAL_UNKNOWN;
cf->lat = 16;
cf->dev = dev;
return (0);
}

static int
hwpstate_settings(device_t dev, struct cf_setting *sets, int *count)
{
struct hwpstate_softc *sc;
struct hwpstate_setting set;
int i;
if (sets == NULL || count == NULL)
return (EINVAL);
sc = device_get_softc(dev);
if (*count < sc->cfnum)
return (E2BIG);
for (i = 0; i < sc->cfnum; i++, sets++) {
set = sc->hwpstate_settings[i];
sets->freq = set.freq;
sets->volts = set.volts;
sets->power = set.power;
sets->lat = set.lat;
sets->dev = set.dev;
}
*count = sc->cfnum;
return (0);
}

static int
hwpstate_type(device_t dev, int *type)
{

if (type == NULL)
return (EINVAL);
*type = CPUFREQ_TYPE_ABSOLUTE;
return (0);
}

static int
hwpstate_is_capable(void)
{
u_int regs[4];
if (strcmp(cpu_vendor, "AuthenticAMD") != 0 ||
cpu_exthigh < 0x80000007)
return (FALSE);
do_cpuid(0x80000007, regs);
if (regs[3] & 0x80) { /* HwPstate Enable bit */
return (TRUE);
}
return (FALSE);
}

static void
hwpstate_identify(driver_t * driver, device_t parent)
{
device_t child;
if (device_find_child(parent, "hwpstate", -1) != NULL) {
return;
}
if ((child = BUS_ADD_CHILD(parent, 10, "hwpstate", -1)) == NULL)
device_printf(parent, "hwpstate: add child failed\n");
}

static int
hwpstate_probe(device_t dev)
{
struct hwpstate_softc *sc;
device_t perf_dev;
uint64_t msr;
int error, type;
if (resource_disabled("hwpstate", 0))
return (ENXIO);

/* this had not to be in hwpstate_identify() */
if (hwpstate_is_capable() == FALSE) {
return (ENXIO);
}
perf_dev = device_find_child(device_get_parent(dev), "acpi_perf", -1);
if (perf_dev && device_is_attached(perf_dev)) {
error = CPUFREQ_DRV_TYPE(perf_dev, &type);
if (error == 0 && (type & CPUFREQ_FLAG_INFO_ONLY) == 0)
return (ENXIO);
}
sc = device_get_softc(dev);
switch (cpu_id) {
case 0x100f2A: /* family 10h rev.DR-BA */
case 0x100f22: /* family 10h rev.DR-B2 */
case 0x100f23: /* family 10h rev.DR-B3 */
break;
default:
return (ENXIO);
}
msr = rdmsr(MSR_AMD10H_LIMIT);
sc->cfnum = AMD10H_GET_PSTATE_MAX_VAL(msr);
if (sc->cfnum == 0) {
device_printf(dev, "hardware-pstate is not supported by the bios.\n");
return ENXIO;
}
device_set_desc(dev, "Cool`n'Quiet 2.0");
return (0);
}

static int
hwpstate_attach(device_t dev)
{
struct hwpstate_softc *sc;
struct hwpstate_setting *set;
device_t F3;
uint64_t msr;
uint32_t cfg;
int i, vid, did, fid;
sc = device_get_softc(dev);

/*
* following 24 means the 1st cpu. 25-31 instead of 24 is MP system.
* I don't have MP system. But only for reading from 1st cpu.
* so if the same 2*cpu, 4*cpu or 8*cpu, this can work, I think.
*/
F3 = pci_find_bsf(0, 24, 3);
cfg = pci_read_config(F3, 0xA0, 4);
if (cfg & 0x10) { /* PVI mode */
if (hwpstate_verbose)
device_printf(dev, "PVI mode\n");
sc->voltage_mode = AMD10H_PVI_MODE;
} else { /* SVI mode */
if (hwpstate_verbose)
device_printf(dev, "SVI mode\n");
sc->voltage_mode = AMD10H_SVI_MODE;
}
msr = rdmsr(MSR_AMD10H_LIMIT);
sc->cfnum = 1 + AMD10H_GET_PSTATE_MAX_VAL(msr);
if (hwpstate_verbose)
device_printf(dev, "you have %d P-state.\n", sc->cfnum);
set = sc->hwpstate_settings;
for (i = 0; i < sc->cfnum; i++, set++) {
msr = rdmsr(MSR_AMD10H_CONFIG + i);
if ((msr & 0x8000000000000000)) {
vid = AMD10H_CUR_VID(msr);
did = AMD10H_CUR_DID(msr);
fid = AMD10H_CUR_FID(msr);
set->freq = 100 * (fid + 0x10) / (1 << did);
if (sc->voltage_mode == AMD10H_PVI_MODE) {
/* 2.4.1.6.2 Parallel VID Encodings */
if (vid >= 0x20)
set->volts = (7625 - 125 * (vid - 0x20)) / 10;
else
set->volts = 1550 - 25 * vid;
} else {
/* 2.4.1.6.3 Serial VID Encodings */
if (vid >= 0x7F)
set->volts = 0;
else
set->volts = (15500 - 125 * vid) / 10;
}
if (hwpstate_verbose)
device_printf(dev, "freq=%dMHz volts=%dmV\n", set->freq, set->volts);
set->pstate_id = i;
set->power = CPUFREQ_VAL_UNKNOWN;
set->lat = 16;
set->dev = dev;
}
}
cpufreq_register(dev);
hwpstate_goto_pstate(dev,0);
return (0);
}

static int
hwpstate_detach(device_t dev)
{

hwpstate_goto_pstate(dev,0);
return (cpufreq_unregister(dev));
}

static int
hwpstate_shutdown(device_t dev)
{

hwpstate_goto_pstate(dev,0);
return (0);
}

static int
hwpstate_features(driver_t * driver, u_int * features)
{

*features = ACPI_CAP_PERF_MSRS;
return (0);
}

randux
January 13th, 2009, 17:13
I'm having performance problems on my new 7.1 installs. How do I check what the values are? I want max. performance and I don't care about power consumption.

vermaden
January 13th, 2009, 17:20
So disable powerd daemon.

randux
January 13th, 2009, 17:23
Hi Vermaden,

Is it on by default? Is that all I have to do? I am running benchmarks now, I'll check soon. Anything else to check? The ubench numbers are great but it feels like a 486 box!

vermaden
January 13th, 2009, 17:36
Its off by default.

Default 7.1 scheduler is suited/tuned for at least 2 cores, so if you have older 1 core CPU, then it may sometimes feel slowish at interactive tasks.

randux
January 13th, 2009, 17:46
I think it's off by default. I will post my ubench and unixbench results in the performance thread I started.

randux
January 13th, 2009, 17:49
We posted at the same time. No, it's a new E8400 core 2 duo box with 4g ram and it feels terrible on freebsd :( Everything else runs great I want to know why...

vermaden
January 13th, 2009, 19:26
We posted at the same time. No, it's a new E8400 core 2 duo box with 4g ram and it feels terrible on freebsd :( Everything else runs great I want to know why...

These are mine results:

$ time unixbench

(...)

BYTE UNIX Benchmarks (Version 4.1.0)
System -- mavio
Start Benchmark Run: Tue Jan 13 18:29:05 CET 2009
4 interactive users.
6:29PM up 8 days, 10:55, 4 users, load averages: 0.21, 0.29, 0.25
-r-xr-xr-x 1 root wheel 115292 Jan 1 12:49 /bin/sh
/bin/sh: ELF 32-bit LSB executable, Intel 80386, version 1 (FreeBSD), for FreeBSD 7.1, dynamically linked (uses shared libs), FreeBSD-style, stripped
/dev/ad5s1e 7870554 5411380 1829530 75% /usr
Dhrystone 2 using register variables 7561720.6 lps (10.0 secs, 10 samples)
Double-Precision Whetstone 1418.4 MWIPS (10.0 secs, 10 samples)
System Call Overhead 397634.2 lps (10.0 secs, 10 samples)
Pipe Throughput 557754.4 lps (10.0 secs, 10 samples)
Pipe-based Context Switching 114951.4 lps (10.0 secs, 10 samples)
Process Creation 6087.5 lps (30.0 secs, 3 samples)
Execl Throughput 1794.2 lps (29.8 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks 503761.0 KBps (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks 113417.0 KBps (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks 70656.0 KBps (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks 134718.0 KBps (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks 77655.0 KBps (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks 47665.0 KBps (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks 1421800.0 KBps (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks 46577.0 KBps (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks 56051.0 KBps (30.0 secs, 3 samples)
Shell Scripts (1 concurrent) 2824.3 lpm (60.0 secs, 3 samples)
Shell Scripts (8 concurrent) 580.0 lpm (60.0 secs, 3 samples)
Shell Scripts (16 concurrent) 296.7 lpm (60.0 secs, 3 samples)
Arithmetic Test (type = short) 1441867.5 lps (10.0 secs, 3 samples)
Arithmetic Test (type = int) 1397100.8 lps (10.0 secs, 3 samples)
Arithmetic Test (type = long) 1403966.3 lps (10.0 secs, 3 samples)
Arithmetic Test (type = float) 582325.3 lps (10.0 secs, 3 samples)
Arithmetic Test (type = double) 579875.4 lps (10.0 secs, 3 samples)
Arithoh nan lps (10.0 secs, 3 samples)
C Compiler Throughput 1443.3 lpm (60.0 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places 89183.8 lpm (30.0 secs, 3 samples)
Recursion Test--Tower of Hanoi 80150.9 lps (20.0 secs, 3 samples)


INDEX VALUES
TEST BASELINE RESULT INDEX

Dhrystone 2 using register variables 116700.0 7561720.6 648.0
Double-Precision Whetstone 55.0 1418.4 257.9
Execl Throughput 43.0 1794.2 417.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 70656.0 178.4
File Copy 256 bufsize 500 maxblocks 1655.0 47665.0 288.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 56051.0 96.6
Pipe Throughput 12440.0 557754.4 448.4
Pipe-based Context Switching 4000.0 114951.4 287.4
Process Creation 126.0 6087.5 483.1
Shell Scripts (8 concurrent) 6.0 580.0 966.7
System Call Overhead 15000.0 397634.2 265.1
=========
FINAL SCORE 332.7
unixbench 1191.09s user 1462.68s system 83% cpu 52:47.01 total

Its Core 2 Duo e6320 1.86GHz 4MB Cache + Intel Q35 + 2 x 1GB 800MHz RAM

With your 3.0GHz e8400 you should get something about 1.5-2 x of mine result.

randux
January 13th, 2009, 20:24
I posted some benchmarks here: http://forums.freebsd.org/showthread.php?t=1427

randux
January 13th, 2009, 20:30
It's almost exactly 2x of your result and most of the benchmarks look very good. But the system still feels very slow and I don't know why.

I installed two new installs today, i386 and AMD64 both with softdeps turned on (I normally run with no softdeps on) and I ran my rarcrack benchmark and there was no change.

morbit
February 10th, 2009, 22:19
Speaking of

dev.cpu.0.cx_lowest=C3
dev.cpu.1.cx_lowest=C2

My notebook (http://forums.freebsd.org/showthread.php?t=1533) has terminal bell stuttering problems and prolonged shutdown sequence if both cores are set to C3.

trev
February 21st, 2009, 08:35
*bump* I've edited the topic "cpufreq for Phenoms and Opterons (AMD Family 10h)" above with a patch supplied by author against his PR submission in November last year.

Carpetsmoker
March 20th, 2009, 15:23
Up to yesterday there was no option to set highest value to limit max CPU speed to save power or limit overheat, but Boris Kochergin wrote a patch to support also the highest limit with debug.cpufreq.highest oid:
Code:
sysctl debug.cpufreq.highest=1200

I've been thinking, and I wonder just how useful this is.

A 2GHz CPU running at 1GHz will not consume anywhere near half the power, while a CPU running at 2GHz will complete a task close to twice as fast.
The result is that the CPU will take more time to complete a task, and it will require more power in the end.

I haven't done any test, but I suspect that setting this value lower than the maximum will actually cause the battery to last shorter.

trev
May 30th, 2009, 05:18
Talking of power savings, from a FreeBSD mailing list in November last year with an early beta version of the AMD 10h cpufreq patch which I have updated above for 10h and now 11h CPUs:


cpu: Phenom 9350e quadcore 4x 2.0GHz (energy efficient 65W version, not black edition)
mem: 1GB DDR2/666 CL5 (1 DIMM)
mobo: Asus M3N72-D Socket AM2+ with nVidia 750a SLI chipset.
hdd: 2,5" 40GB Hitachi notebook HDD on Parallel ATA (udma33)
power supply: Huntkey Greenstar 400W
measurement: average real power drain at wall socket, tested with
Voltcraft Energycheck 3000

@2000: 88.4W
@1200: 75.0W
@1000: 68.5W
@800: 65.0
@600: 60.7W
@400: 58.0W

So in total i shave off 30.4W when idling using cool'n'quiet!! This is very very cool. =)

morbit
May 30th, 2009, 11:24
Mind you this is from CURRENT, however this is very nice thread about power saving:

http://lists.freebsd.org/pipermail/freebsd-current/2009-May/006436.html

vermaden
June 1st, 2009, 09:42
Mind you this is from CURRENT, however this is very nice thread about power saving:

http://lists.freebsd.org/pipermail/freebsd-current/2009-May/006436.html

Yes, great post, I have read it some time ago.

vermaden
October 27th, 2009, 11:27
Update:

FreeBSD 8.0-RC1/RC2 does not offer as many frequency levels as 7.2-RELEASE, bug submitted:
http://freebsd.org/cgi/query-pr.cgi?pr=140010

richardpl
October 27th, 2009, 16:03
Are you sure that you don't have some lines in loader.conf
What CPU is that, and what modules are loaded?
Better to post this on mailing list because PR may be ignored for a while.

vermaden
October 27th, 2009, 18:23
@richardpl

You are right of course, it was that setting in /boot/loader.conf:
hint.acpi_throttle.0.disabled=1, I will update "bug" info right now.

oliverh
October 27th, 2009, 19:22
Update:

FreeBSD 8.0-RC1/RC2 does not offer as many frequency levels as 7.2-RELEASE, bug submitted:
http://freebsd.org/cgi/query-pr.cgi?pr=140010

Hi vermaden

maybe it's not a bug, but a feature? Many of the shown frequency levels are more ore less nonsense on most cpus.

richardpl
October 27th, 2009, 19:26
I use that one on loader.conf for following reasons:
1. acpi_throttle fails to attach on second core sometimes
2. acpi_throttle have very little power save gain and very big performance drop (when combined with wrong powerd flags)
3. it actually makes CPU just wait/halt - it doesnt put it in any "lower power state"
4. I really hate acpi

vermaden
October 27th, 2009, 22:38
@oliverh

No mate, its just my fault because I set an option that "disabled" most of them:
http://forums.freebsd.org/showpost.php?p=46520&postcount=31

@richardpl

I added that option because of this post:
http://lists.freebsd.org/pipermail/freebsd-current/2009-May/006436.html

But I propably misread something, but thanks also for your reasons, it may be useful in the future.

trev
January 22nd, 2010, 02:32
*bump* I've edited the post "cpufreq for Phenoms and Opterons (AMD Family 10h)" above as a new file is now required for the current FreeBSD 7.2-STABLE source.

vermaden
January 22nd, 2010, 07:34
@trev

What about 8.0-RELEASE/8-STABLE, these changes have been already merged there?

trev
January 24th, 2010, 08:12
What about 8.0-RELEASE/8-STABLE, these changes have been already merged there?

Not according to http://cameldung.org/man/index.cgi?query=cpufreq&sektion=4&apropos=0&manpath=FreeBSD+8.0-RELEASE+and+Ports and http://cameldung.org/man/index.cgi?query=cpufreq&apropos=0&sektion=4&manpath=FreeBSD+8.0-stable&format=html which only show support for K7 and K8 (not K10 and K11).

I don't know for sure as I haven't upgraded the AMD box to 8 yet because 8.0-R and -S do not boot on my other system (Mac Mini, early 2009) and I like to keep them in sync (sharing same source tree).

royce
May 31st, 2010, 07:46
I've been thinking, and I wonder just how useful this is.

A 2GHz CPU running at 1GHz will not consume anywhere near half the power, while a CPU running at 2GHz will complete a task close to twice as fast.
The result is that the CPU will take more time to complete a task, and it will require more power in the end.

I haven't done any test, but I suspect that setting this value lower than the maximum will actually cause the battery to last shorter.

Dropping the CPU frequency can be useful for non-power-saving reasons.

I'm actually interested in debug.cpufreq.highest for underclocking a system with a bad fan (that I can't replace until later this week).

aragon
May 31st, 2010, 18:31
I'm actually interested in debug.cpufreq.highest for underclocking a system with a bad fan (that I can't replace until later this week).
I suspect that's unnecessary. Between Intel and AMD, I don't think any CPUs have been made without thermal protection in the last 5 years. They limit the clock speed if over temperature by themselves...

Carpetsmoker
May 31st, 2010, 19:09
Yes, they limit themselves by shutting down ;)

chavez243ca
November 5th, 2010, 14:32
I enabled powerd on a Dell PE1900 - single Xeon quad-core, using flags

-a adaptive -b adaptive -i 90 -r 50 -p 100

When powerd was started, CPU was throttled down to ~600Mhz, but there was zero change in the overall power consumption of the server, as measured by the UPS it is connected to. At idle I was still seeing a draw of ~105 watts. I ran unixbench to get an idea of what full load draws, and that was ~140 watts peak.

Turning off powerd - I saw no change in consumption...

Any thoughts?

morbit
November 5th, 2010, 15:47
Do you use C3 or deeper states?

Check:

$ sysctl dev.cpu.0.cx_lowest

and

$ sysctl dev.cpu.0.cx_supported

$ sysctl dev.cpu.0.cx_usage


+

from acpi man:

The acpi CPU idle power management drive conflicts with the local APIC
(LAPIC) timer. Disable the local APIC timer with hint.apic.0.clock=0 or
do not use the C3 and deeper states if the local APIC timer is enabled.

chavez243ca
November 6th, 2010, 15:36
Supported C-states are c1/0 c2/60

morbit
November 6th, 2010, 16:16
set

performance_cx_lowest="C2"
economy_cx_lowest="C2"

in /etc/rc.conf, and see if it changes power draw.

You can check C states usage from sysctl dev.cpu.0.cx_usage

chavez243ca
November 7th, 2010, 14:40
oddly enough - making those changes has actually increased the consumption to an average of 140 watts. UPS load went from 14% to 18%. I did notice in BIOS though that performance based power consumption feature is read-only and set to disable - which is an indicator this Xeon lacks certain support. Sysctl knobs do indicate the rc.conf settings did force it to use C2 states. Does not make much sense - but it's likely not worth putting a bunch of work into, I have bigger fish to fry.

BertK88
December 28th, 2011, 22:55
oddly enough - making those changes has actually increased the consumption to an average of 140 watts. UPS load went from 14% to 18%. I did notice in BIOS though that performance based power consumption feature is read-only and set to disable - which is an indicator this Xeon lacks certain support. Sysctl knobs do indicate the rc.conf settings did force it to use C2 states. Does not make much sense - but it's likely not worth putting a bunch of work into, I have bigger fish to fry.

My Intel server has slightly better savings with:
/etc/rc.conf
performance_cx_lowest="C3"
economy_cx_lowest="C3"

than with:
/etc/rc.conf
powerd_enable="YES"
powerd_flags="-m 199 -M 2395 -a adaptive -n adaptive"

Using them together gives a little higher energy use.

Regards,

Bert

vermaden
December 29th, 2011, 07:47
Using them together gives a little higher energy use.
Then why not submit this as a BUG then?

aragon
December 30th, 2011, 15:25
Using them together gives a little higher energy use.

I suggest you try add the following to /boot/loader.conf:


hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"


I've seen more than one FreeBSD dev saying (more kindly than me) that P4TCC is useless, and I for one have seen one of my systems use more energy with it enabled.

YMMV