HOWTO cpu grooming (cpuset usage)

For multiprocessor (SMP) systems, FreeBSD has the cpuset command to adjust which CPUs a process will use.

The purpose of this is, on some architectures some sets of CPUs have access to their specific common cache modules, or similar - so, when restricting a process to run only on the CPUs belonging to such a set (instead of any CPU available), it will always have that cache available, and that may improve performance a bit.
This is what cpuset is designed for, and using it in that way is quite straightforward: just invoke the process with cpuset and tell it the intended CPUs. Or, if you want to go to the higher arts of NUMA stuff, then you usually know already what your equipment is able to do.

Occasionally people come up with a different idea: they want to run some program exclusively on, say, CPU 2+3, and restrict all other processes to not use CPU 2+3. There is no real practical advantage in doing so: if a program can exclusively use a CPU, it may work faster, but then, if any process has to wait for it's specifically assigned CPU instead of just using the next-best available one, that will degrade over-all performance. So, on production systems this is rather pointless. But then, on integration systems (or any kind of home/hobby/experimantal/whatever machines) if you see your CPUs in top -P output and know what each one is currently doing, that might give some extra information.

But then, when people try to do this, i.e. tell the system to use only a restricted set of CPUs for the already-running processes, they might likely run into an error
# cpuset -l 0-1 -s 1
cpuset: setaffinity: Resource deadlock avoided


The reason for this error lies in the fact that cpuset has two levels of operation: It can support up to 255 different cpusets, and each process belongs to one of these sets, while each set can name an arbitrary list of CPUs on which the respective processes are allowed to run. And on the second level, each individual thread has another "cpu-mask", a list of CPUs where it is allowed to run.
Now, if a process belongs to a set that allows CPUs 1-3, and one of the threads of this process is allowed to only run on CPU 3, that is fine. But if you then try to change that set to contain only CPU 1-2, then that respective thread would no longer be allowed to run anywhere - and that is when this error message appears.

Now two questions come up: 1. Why does this happen? 2. How can it be avoided?

1. The kernel itself, that is, process nr. 0, contains a lot of threads, and for performance reasons some of these threads are already distributed over all available CPUs. So if you try to restrict set nr.1 (which at the beginning contains all processes) to some CPUs, this does also concern process nr. 0 and it's threads, and therefore it fails as described.

2. Easy: move all concerned processes to a new set. Thereby all the thread-specific cpu-masks will be reset to the value of this new set. So the successful command would be:
# cpuset -C -l 0-1 -p 0 # (for the kernel only)
# cpuset -l 0-1 -s 1 # (all the other stuff)

Obviousely, by this the distribution of those kernel threads to all existing CPUs goes away - but as said above, the whole endeavour does not increase performance, anyway.

Now, in order to get all this a little bit more nicely, I made a little script to be put into /usr/local/etc/rc.d, and then you can distribute your CPU resources from /etc/rc.conf:
Code:
# cpuset
cpusetup_enable="YES"
cpusetup_kern="0-1"  # the kernel itself - ZFS will love to have more than one core
cpusetup_subsys="3"  # interrupts, vmdaemon, etc.
cpusetup_set1="2"    # all the other processes not in jails
cpusetup_jails="2"   # all the jails (if you want them handled individually, help yourself)
cpusetup_other="0-3" # other stuff that has been put into a specific cpu-set

One more thing to observe: the set 1 must contain CPUs that are used in any jail - for whatever reason, and this is true as of Rel. 11.2

Finally, to check what you have configured, use procstat -a -S:
Code:
  PID    TID COMM                TDNAME              CPU CSID CPU MASK
    0 100000 kernel              swapper              -1    3 0-1
    0 100008 kernel              thread taskq         -1    3 0-1
    0 100010 kernel              aiod_kick taskq      -1    3 0-1
...
    1 100001 init                -                    -1    1 2
    2 100023 ftcleanup           -                    -1    3 3
    3 100028 crypto              -                    -1    3 3
...
1209 100873 firefox             DOM File             -1    1 2
2062 100942 xterm               -                    -1    1 2
2064 100943 ssh                 -                    -1    1 2

The script (I would like to attach it, but rc.d scripts have no filename suffix, and the board seems not to like such.
Feel free to fix found errors:
Code:
#!/bin/sh

# PROVIDE: cpusetup
# REQUIRE: jail
# KEYWORD: shutdown

. /etc/rc.subr

name="cpusetup"
rcvar=cpusetup_enable
start_cmd="${name}_start"
stop_cmd="${name}_stop"

# read configuration and set defaults
load_rc_config $name
: ${cpusetup_enable:="no"}
: ${cpusetup_kern:=""}          # ZFS. wants >1 cpu!
: ${cpusetup_subsys:=""}
: ${cpusetup_jails:=""}
: ${cpusetup_set1:=""}          # must contain CPU of every jail!
: ${cpusetup_other:=""}

# Rules of the game:
# 1. Kernel may contain subsystems which bind their threads to
#    individual CPUs; therefore moving the whole system at once
#    to certain CPUs (per "cpuset -l 0 -s 1") usually fails.
#    Instead, move the kernel (pid 0) to a new cpuset first
#    (while keeping the thread distribution or not).
# 2. The same error ("resource deadlock") will also appear if some
#    cpuset contains processes with individual threads which are
#    bound to certain CPUs and one tries to move that cpuset to
#    different CPUs.
# 3. cpuset 1 must always have CPUs from all jails (running and
#    dying), even when there are no processes in cpuset 1.
# 4. When stopping a jail, the invoking process must have CPUs
#    from that jail, otherwise jail_attach will fail. Beware,
#    this is also true at shutdown!

cpusetup_start()
{
        local cpuset_max=255
        local subsys_pids=`ps ax -o pid,ppid | awk '$2 == 0 {print $1}' | \
                egrep -v "^(0|1)\$"`

        echo "Setting CPU affinity."

        # Kernel:
        cpuset -C ${cpusetup_kern:+-l $cpusetup_kern} -p 0
        cpuset_kern=`cpuset -gi -p 0 | awk '{print $NF}'`

        # Die übrigen Subsysteme:
        for i in $subsys_pids; do
                if ! test "$cpuset_subsys"; then
                        cpuset -C ${cpusetup_subsys:+-l $cpusetup_subsys} \
                                        -p $i
                        cpuset_subsys=`cpuset -gi -p $i | awk '{print $NF}'`
                else
                        cpuset ${cpusetup_subsys:+-l $cpusetup_subsys} \
                                        -s $cpuset_subsys -p $i
                fi
        done

        # set 1:
        cpuset ${cpusetup_set1:+-l $cpusetup_set1} -s 1

        # die jails:
        for i in `jls  -n cpuset.id | awk -F= '{print $2}'`; do
                cpuset_jails=${cpuset_jails:+${cpuset_jails}|}$i
                cpuset ${cpusetup_jails:+-l $cpusetup_jails} -s $i
        done

        # der Rest:
        for i in `seq 1 $cpuset_max | \
                egrep -v "^(1|$cpuset_kern|$cpuset_subsys|$cpuset_jails)\$"`
        do
                if cpuset -g -s $i > /dev/null 2>&1; then
                        cpuset ${cpusetup_other:+-l $cpusetup_other} -s $i
                fi
        done
}

cpusetup_stop()
{
        local cpuset_max=255
        local subsys_pids=`ps ax -o pid,ppid | awk '$2 == 0 {print $1}' | \
                egrep -v "^(0|1)\$"`
        local allcpus

        echo "Resetting CPU affinity."
        allcpus=`sysctl -n kern.smp.cpus`
        allcpus=0-`expr $allcpus - 1`

        # Kernel:
        cpuset -C -l $allcpus -p 0
        cpuset_kern=`cpuset -gi -p 0 | awk '{print $NF}'`

        # Die übrigen Subsysteme:
        for i in $subsys_pids; do
                if ! test "$cpuset_subsys"; then
                        cpuset -C -l $allcpus -p $i
                        cpuset_subsys=`cpuset -gi -p $i | awk '{print $NF}'`
                else
                        cpuset -l $allcpus -s $cpuset_subsys -p $i
                fi
        done

        # set 1:
        cpuset -l $allcpus -s 1

        # die jails:
        for i in `jls  -n cpuset.id | awk -F= '{print $2}'`; do
                cpuset_jails=${cpuset_jails:+${cpuset_jails}|}$i
                cpuset -l $allcpus -s $i
        done

        # der Rest:
        for i in `seq 1 $cpuset_max | \
                egrep -v "^(1|$cpuset_kern|$cpuset_subsys|$cpuset_jails)\$"`
        do
                if cpuset -g -s $i > /dev/null 2>&1; then
                        cpuset -l $allcpus -s $i
                fi
        done
}

run_rc_command "$1"
 
Back
Top