microfreezes - FreeBSD 13

WCSN · Jun 13, 2022

Hi!

FreeBSD 13.1 (13.0)

FreeBSD wcsn 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64
CPU: AMD Ryzen 5 3600 6-Core Processor (3600.10-MHz K8-class CPU)
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X]'

Have last KDE Plasma, amdgpu, mesa, drm.

There were annoying microfreezes lasting 0.1-0.2 sec at any work: scrolling through texts, watching videos, playing audio, even moving the window across the screen gave such delays.

Accidentally came across a mention of the key: kern.sched.steal_thresh.

Setting it to 0 (default is 2) sysctl kern.sched.steal_thresh=0 completely eliminated all microphryses.

Why is that?

mer · Jun 13, 2022

There is a good list of at least definitions of a bunch of sysctls here:

freebsd_sysctl.md

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

For this one specifically it says:
Minimum load on remote CPU before we'll steal

My understanding is the value relates to migrating processes/threads around from CPU to CPU (or core to core, etc).
Schedulers try to keep all cores running at least something, otherwise, the core is wasted. I would guess the stuttering/microfreezes are a result of a thread getting rescheduled onto another core, priorities recomputed and then set running again or threads migrating to the current core, recompute and then run highest priority.
I would guess that 0 effectively disables core migration which eliminates the problem.

A lot of recommendations say "set it to 1", not 0, but then I found this in the email lists. The 0 and 2 were interesting, plus the "simply running the dtrace script made the problem go away"

freebsd 13 ryzen micro stutter

Above is all my opinion/best guesses, I'm not intimate with the current scheduler (ULE) implementation.

WCSN · Jun 14, 2022

mer said:
There is a good list of at least definitions of a bunch of sysctls here:

freebsd_sysctl.md

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

freebsd 13 ryzen micro stutter

Above is all my opinion/best guesses, I'm not intimate with the current scheduler (ULE) implementation.

Thanks for the reply.
Yes, I have seen this discussion.
I tried to set the value to 1 - there are also no microfreezes.

It is possible, of course, that the behavior changes with "multicore", but looking at the loading of cores (htop), I did not notice the difference.

In any case, such behavior as now (with a value of 0) suits me more than constant annoying stumbles.

So, probably someone has this happening, then you can use it as a solution. By the way, I remember that there was no such behavior in the 12th version of FreeBSD, but there were fair problems with video drivers and drm.

WCSN · Jun 14, 2022

This problem can have specific AMD CPU Ryzen dependency?

mer · Jun 14, 2022

WCSN said:
This problem can have specific AMD CPU Risen dependency?

Yes there may be a dependency on AMD Ryzen. A lot of email threads if you search have AMD Ryzen cpu.

Erichans · Jun 14, 2022

If I were to guess, it might have something to do with the non-optimal handling of the thread to CPU core affinity by the ULE scheduler with the Zen 2 architecture. In Zen 2 each core complex (=CCX) consists of 2 CPUs and each CCX has one L3 cache that is only shared between its 2 CPU cores. A 6 core Ryzen Zen 2 has 3 CCXs on one core complex die (=CCD) and a separate I/O die. A Zen 2 CPU is made up of several dies; each die is manufactured as an individual component not "tied" to another one, even when located on the same wafer.

When a thread is moved from a core of, say CCX_A, to another core at CCX_B, it cannot access the cache data in the L3 cache of CCX-A. That cache data comprises the thread's code and data. This could be moved from (the L3 cache of) CCX_A to (the L3 cache of) CCX_B, but I don't think that is what happens. Memory pages in the L3 cache of CCX_A that have not been written to, simply get invalidated and, when needed at a core of CCX_B, have to be fetched again from RAM. Memory pages in the L3 cache of CCX_A that have been written to, have to be written out to RAM.

The move from CCX_A to CCX_B doesn't happen out of the blue I suspect; a thread is probably being evicted from a core of CCX_A by another thread having a higher priority.

mer said:
[...] A lot of recommendations say "set it to 1", not 0, but then I found this in the email lists. The 0 and 2 were interesting, plus the "simply running the DTrace script made the problem go away"

DTrace may intertwine itself with running code in such a way that the threads that were evicicted from CCX_A (without DTrace running) are now being confined to this (CPU core of this) CCX. At least they now show a higher affinity for the core (of CCX_A) where they are running. That could be a consequence of DTrace running with an elevated priority. Reasoning this way, it may have nothing to do with the CPU load as such, as more or less suggested by:

So it seems that the light load of the running DTrace-script was enough to eliminate any micro-stutters.

From kern.sched.steal_thresh:

Rich (BB code):

integer kern.sched.steal_thresh
Minimum load on remote CPU before we'll steal

It would be nice to have a more extensive description of this kernel parameter, especially what different values mean. I think it has something to do with how easily a thread's resources (= the CPU core on the CCX where it is running) can be "hijacked" by another thread; i.e. stolen.

It would be interesting if you could run FreeBSD on a Zen 2 CPU with only one CCX enabled: a 2 core CPU basically, with one shared L3 cache. If there are no micro stutters or freezes anymore, then, that would be a good argument to look at the problem from this perspective.

mer · Jun 14, 2022

Erichans Yep, I remember when we only had to worry about L1 cache and single cores. Now we're up to L3 and lots of cores and such. Migrating threads/processes has a cost and schedulers try to spread that cost over a lot of threads.
As you mentioned, toss in I/O busses and how the cores access them.
Anyone that tells you "schedulers are easy", ask them what they're smoking.

WCSN · Jun 14, 2022

I didn't dig so deep, but it's interesting that I somehow thought that it was somehow connected with switching the flow - because there is clearly a "data flush", and it is in the CPU core - otherwise it can't be explained. But there is no system with a 2-core Risen nearby. There is nothing to check on.
Do I understand correctly that kern.sched.steal_thresh increases the aggressiveness of moving threads from core to core? and 0 prohibits moving and therefore there are no "data flushes"?
(If set 1 - I not have freezes too).

mer · Jun 14, 2022

I think Erichans grabbed the meaning from the one above, I think it should be:
<code>
integer kern.sched.steal_thresh
Minimum load on remote CPU before we'll steal
</code>
It the correct one. My understanding is the value means "how lightly loaded the other core is before we take work away from it".
So 0 would imply the other core is not doing any work, so nothing to take.
If you set it to say 128, you would likely steal work from it alot, so migrating threads, flushing caches, etc.
I would say "setting a higher value increases the aggressiveness of moving thread from core to core"

Erichans · Jun 14, 2022

Thanks, I did grab the wrong one: corrected.

CyberCr33p · Jun 14, 2022

Can you reset all the settings to default and check if this solves the issue?

sysctl machdep.idle=mwait

WCSN · Jun 14, 2022

Hm... Now I have this:

machdep.idle: acpi
machdep.idle_available: spin, mwait, hlt, acpi
machdep.idle_apl31: 0
machdep.idle_mwait: 1
machdep.mwait_cpustop_broken: 0

Mb not set machdep.idle=mwait ? Mb this: machdep.idle_mwait: 1 ???

CyberCr33p · Jun 14, 2022

I had an issue with ping taking 0.2 sec longer which caused by cores going to sleep mode. If the server had some load then ping delay was as expected. I fix it by setting "machdep.idle=mwait" that's why I am curious if this fixes your issue too.

WCSN · Jun 15, 2022

CyberCr33p said:
I had an issue with ping taking 0.2 sec longer which caused by cores going to sleep mode. If the server had some load then ping delay was as expected. I fix it by setting "machdep.idle=mwait" that's why I am curious if this fixes your issue too.

machdep.idle: mwait
machdep.idle_available: spin, mwait, hlt, acpi
machdep.idle_apl31: 0
machdep.idle_mwait: 1
machdep.mwait_cpustop_broken: 0

No. For machdep.idle: mwait and kern.sched.steal_thresh=2 I have m.freeze.

If kern.sched.steal_thresh=1 or kern.sched.steal_thresh=0 and any value machdep.idle - not have m.freeze

machdep.idle has no effect on freezing problems...

Cath O'Deray · Jun 15, 2022

WCSN said:
… machdep.idle not machdep.idle_mwait ?

<https://gist.github.com/dch/e2ccb70...6d00#user-content-integer-machdepidle_mwait-1> describes machdep.idle_mwait as:

Use MONITOR/MWAIT for short idle

– what that means, in practice, I don't know, sorry.

Cath O'Deray · Jun 15, 2022

dch ▲

WCSN · Jun 15, 2022

grahamperrin@ said:
<https://gist.github.com/dch/e2ccb70...6d00#user-content-integer-machdepidle_mwait-1> describes machdep.idle_mwait as:

Use MONITOR/MWAIT for short idle

– what that means, I don't know, sorry.

If this parameter have val. 1, mwait used on any case? Or need set machdep.idle=mwait too?

mer · Jun 15, 2022

WCSN said:
If this parameter have val. 1, mwait used on any case? Or need set machdep.idle=mwait too?

I would think you need to set machdep.idle=mwait also. It depends on the internal definition of "short idle" I think.

WCSN · Jun 15, 2022

mer said:
I would think you need to set machdep.idle=mwait also. It depends on the internal definition of "short idle" I think.

Yes, I said above that I did.
I read about mwait, it probably makes sense, but it didn't affect the freezing. Probably by itself, such a value (mwait) will be useful for performance.

I wonder if there is such behavior (freezing) on a similar system but with Intel or Risen 3.

kern.sched.steal_thresh=1 because (maybe) threads will be thrown somehow... with this value, there are no m. friezes or I do not notice them.

I write to sysctl.conf:

kern.sched.steal_thresh=1
machdep.idle=mwait

mer · Jun 15, 2022

I'm not sure if I've seen anything about freezing on Intels; I've been running with the steal_thresh=1 for a while, I've had occasional things where audio would get choppy (like you tube in firefox playing, then doing a pkg upgrade and a little audio issues while extracting a big pkg like llvm) that would clear up quickly. That could be similar to your freezes.

I just added the idle=mwait an hour or so ago and I'll keep an eye on it.

If just the idle=mwait made no difference to you, then I would guess whatever is happening in the scheduler around steal_thresh is really the key and may be something that is desirable for interactive/desktop use vs server.

ETA:
I often see messages about "client bug: event processing lagging behind by XXms your system is too slow" in Xorg.0.log (others have reported this too). Not exactly what is the problem, not even sure if it's real, but I'm going to see if having idle=mwait changes the frequency of the message or eliminates them. I've seen 3ms <= XX <= 15ms in the log which is roughly your 0.1-0.2 secs.
ETA:
machdep.idle=mwait seems have a positive impact on the "system is too slow" messages in X.

WCSN · Jun 17, 2022

Good afternoon.

After 5 days of using the current settings (those discussed above), I can say the following:
When running 2-virtual machines (Virtualbox) FreeBSD and Windows 7, running Firefox, which shows a i1080 movie from YouTube, Eclipse is launched and the project is compiled there, the video is encoded and the output of processed phonograms to (article in Russian) DAT-recorder is launched at the same time.
(32GB RAM).

There are no microphryses - you can also run openarena (I don't play, but I installed it for the test).

I'm as happy as an elephant.

mer · Jun 17, 2022

Thanks for the update.

WCSN · Jul 21, 2022

Sorry...
I forgot to say that I sent a bug: Bug 265337

markj · Jul 21, 2022

If anyone's able to reproduce this with a kernel built from the main development branch, with steal_idlethresh set to the default, please let me know. Just test a kernel with something like:

$ git checkout main
$ make buildkernel -j$(sysctl -n hw.ncpu)
$ sudo make installkernel INSTKERNNAME=kernel.test
$ sudo nextboot -k kernel.test
$ sudo shutdown -r now

Then the system will boot from /boot/kernel.test once. After the next reboot, it'll go back to /boot/kernel.

In particular, there have been some bug fixes in the scheduler and idle loop which might help.

WCSN · Jul 21, 2022

markj said:
If anyone's able to reproduce this with a kernel built from the main development branch, with steal_idlethresh set to the default, please let me know. Just test a kernel with something like:

$ git checkout main
$ make buildkernel -j$(sysctl -n hw.ncpu)
$ sudo make installkernel INSTKERNNAME=kernel.test
$ sudo nextboot -k kernel.test
$ sudo shutdown -r now

Then the system will boot from /boot/kernel.test once. After the next reboot, it'll go back to /boot/kernel.

In particular, there have been some bug fixes in the scheduler and idle loop which might help.

Ok. Yes, I can, thanks for the instructions. I will do it.
Remember what remove from sysctl.conf:
#kern.sched.steal_thresh=1
#machdep.idle=mwait