What is sysctl kern.sched.preempt_thresh

larshenrikoern · Jun 23, 2022

To sum up the first day of discussions in this thread.

The value suggested in most guides 223 is to high. My suspicion is that it is a value suggested quite early in the development of the ULE scheduler. On my system lowering the value has actually lowered the cpu temperature quite a bit. So using the resources more efficient, and reducing cpu load (and thereby improving interactivity more than a higher value). We are currently considering the following values for a interactive desktop system:

kern.sched.preempt_thresh=151 Time-share threads in interactive category

kern.sched.preempt_thresh=171 Time-share threads in batch category with nice priority between -1 and -20

kern.sched.preempt_thresh=162 as a middle ground between the two

On servers these values is not recommended. Mostly for desktops.

I might suggest that one or more of these values will be placed in the handbook as a part of tuning with sysctl.conf

Comments much appreciated.
Have a nice day

thindil · Jun 23, 2022

larshenrikoern said:
To sum up the first day of discussions in this thread.

The value suggested in most guides 223 is to high. My suspicion is that it is a value suggested quite early in the development of the ULE scheduler. On my system lowering the value has actually lowered the cpu temperature quite a bit. So using the resources more efficient, and reducing cpu load (and thereby improving interactivity more than a higher value). We are currently considering the following values for a interactive desktop system:

kern.sched.preempt_thresh=151 Time-share threads in interactive category

kern.sched.preempt_thresh=171 Time-share threads in batch category with nice priority between -1 and -20

kern.sched.preempt_thresh=162 as a middle ground between the two

On servers these values is not recommended. Mostly for desktops.

I might suggest that one or more of these values will be placed in the handbook as a part of tuning with sysctl.conf

Comments much appreciated.
Have a nice day

While I agree that value 162 looks as the best recommendation for the typical desktop, I would wait with calling 223 too high.
I have done some tests with values above 255, if I understand correctly, it should lead to almost disable preemption. The daily desktop tasks didn't get any benefit from it, but extremely heavy tasks, like playing modern AAA Windows games with almost constant full usage of the resources, I had some better interactivity. But to be honest, I need more tests about it.

larshenrikoern · Jun 23, 2022

thindil said:
While I agree that value 162 looks as the best recommendation for the typical desktop, I would wait with calling 223 too high.
I have done some tests with values above 255, if I understand correctly, it should lead to almost disable preemption. The daily desktop tasks didn't get any benefit from it, but extremely heavy tasks, like playing modern AAA Windows games with almost constant full usage of the resources, I had some better interactivity. But to be honest, I need more tests about it.

Values over 256 has no effect. The max value is 256. To raise one single programs priority over the others use rtprio

thindil · Jun 23, 2022

larshenrikoern said:
Values over 256 has no effect. The max value is 256. To raise one single programs priority over the others use rtprio

Yes, now I know that they don't have effect. But what happens when you enter too high value? It is reduced to 0 or the max allowed value used?

larshenrikoern · Jun 23, 2022

thindil said:
While I agree that value 162 looks as the best recommendation for the typical desktop, I would wait with calling 223 too high.
I have done some tests with values above 255, if I understand correctly, it should lead to almost disable preemption. The daily desktop tasks didn't get any benefit from it, but extremely heavy tasks, like playing modern AAA Windows games with almost constant full usage of the resources, I had some better interactivity. But to be honest, I need more tests about it.

What we are doing when moving around with these values are moving the bottlenecks on a system. If you have sufficient memory the cpu, diskd, or gpu might be your limiting factor. So the optimal optimizations will be a different from system to system and after its usage. If you are using just one program that might be the one thing that matters.

To me interactivity means that the system is made not to do one single task, but a lot of different tasks after my need. And make sure the system responds quickly and efficient. My current value of kern.sched.preempt_thresh=151 is doing this perfectly right now on machine. Of course I have other optimizations as well, and right now they are playing together very well.

PMc · Jun 23, 2022

thindil said:
Yes, now I know that they don't have effect. But what happens when you enter too high value? It is reduced to 0 or the max allowed value used?

Being at a point where you know what to look for and what it is about, I believe you can now start reading the source.

I't not so difficult:

Code:

$ find /usr/src/sys -type f | xargs grep -l preempt_thresh
/usr/src/sys/kern/sched_ule.c
/usr/src/sys/amd64/compile/C6R12V1/kernel.full
/usr/src/sys/amd64/compile/C6R12V1/sched_ule.o
/usr/src/sys/amd64/compile/C6R12V1/kernel.debug

$ grep preempt_thresh /usr/src/sys/kern/sched_ule.c
 * preempt_thresh:      Priority threshold for preemption and remote IPIs.
static int __read_mostly preempt_thresh = PRI_MAX_IDLE;
static int __read_mostly preempt_thresh = PRI_MIN_KERN;
static int __read_mostly preempt_thresh = 0;
        if (preempt_thresh == 0)
        if (pri <= preempt_thresh)
SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW,
    &preempt_thresh, 0,

Besides variable initialization, there isn't much there. Have a look at it:

sched_ule.c « kern « sys - src - FreeBSD source tree

cgit.freebsd.org

That isn't too difficult to read, or is it?

mer · Jun 23, 2022

PMc said:
That isn't too difficult to read, or is it?

Reading source code is often never very difficult, once you understand the language.
Understanding the behavior is usually the difficult part

thindil · Jun 23, 2022

PMc said:
Being at a point where you know what to look for and what it is about, I believe you can now start reading the source. I't not so difficult:

Code:

$ find /usr/src/sys -type f | xargs grep -l preempt_thresh /usr/src/sys/kern/sched_ule.c /usr/src/sys/amd64/compile/C6R12V1/kernel.full /usr/src/sys/amd64/compile/C6R12V1/sched_ule.o /usr/src/sys/amd64/compile/C6R12V1/kernel.debug $ grep preempt_thresh /usr/src/sys/kern/sched_ule.c * preempt_thresh: Priority threshold for preemption and remote IPIs. static int __read_mostly preempt_thresh = PRI_MAX_IDLE; static int __read_mostly preempt_thresh = PRI_MIN_KERN; static int __read_mostly preempt_thresh = 0; if (preempt_thresh == 0) if (pri <= preempt_thresh) SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW, &preempt_thresh, 0,

Besides variable initialization, there isn't much there. Have a look at it:

sched_ule.c « kern « sys - src - FreeBSD source tree

cgit.freebsd.org

That isn't too difficult to read, or is it?

Isn't too difficult, thank you for pointing me there.

Alain De Vos · Jun 24, 2022

I've put:

Code:

kern.sched.preempt_thresh=120

larshenrikoern · Jun 24, 2022

Alain De Vos said:
I've put:

Code:

kern.sched.preempt_thresh=120

Is that for a desktop ??? And for general use. I am sure that it is running well for you

My pick was the other end of the same 151

larshenrikoern · Jun 24, 2022

Hi

I am still a bit unsure what some of the terms means for desktop use. For instance I am a bit unsure about "Time-share threads in batch category". Some practical examples would be nice. Because I want to put the number as low as possible, to spare system resources.

I do know about the kernel threads and in the other end idle threads. And giving the latter realtime priority will most of the time be silly. But the ones in between. Where to put the line for kern.sched.preempt_thresh ?? Lets argue a bit over that, just to making everyone a bit more knowledgeable

120
151
152
162
171
172
203
204
223

mer · Jun 24, 2022

batch category, maybe compilation? Like doing a build world, would those threads wind up in batch?
The part about "nice priority" does that imply the process/thread was subject to the "nice" command?
Don't threads start off with a default priority? Running top looking at things running under my username
they are all at the same priority (minor changes as they are used) A user can nice something to lower the priority (yes root can increase the priority), but otherwise when the thread is created the priority is assigned.

I could be wrong about how a thread gets it's default priority, but that's my understanding.

EDIT:
It been reported/stated that priorities in top are "value from the kernel - 100" so most of the threads owned by my user are running at 20: typical desktop use, firefox, terminal windows, ssh sessions. firefox has a few different threads running (even with one tap open), sometimes the priority for a firefox thread goes to 21-27. This implies my user threads run at about "120" normally, maybe get to 127. That winds up crossing categories a little. I'm not nicing anything so I'd guess these are default priority for user processes.

PMc · Jun 24, 2022

Just run an endless loop and watch the priority change:

Code:

$ while true
do
:
done &
$ ps axl | head -1 ; ps axl | grep sh
  UID  PID PPID  C PRI NI    VSZ    RSS MWCHAN   STAT TT        TIME COMMAND
    2 4981 3650  0  20  0  12176   3244 wait     S     3     0:00.01 sh
    2 4987 4981  2  97  0  12176   3244 -        R     3     0:17.58 sh

larshenrikoern · Jun 24, 2022

PMc said:

Just run an endless loop and watch the priority change:

Code:

$ while true
do
:
done &
$ ps axl | head -1 ; ps axl | grep sh
  UID  PID PPID  C PRI NI    VSZ    RSS MWCHAN   STAT TT        TIME COMMAND
    2 4981 3650  0  20  0  12176   3244 wait     S     3     0:00.01 sh
    2 4987 4981  2  97  0  12176   3244 -        R     3     0:17.58 sh

I do not understand why I should try doing that. And it stops at 103 when running as normal user. So what does it prove ??

lanin · Jun 24, 2022

The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS

larshenrikoern · Jun 24, 2022

Looks quite interesting. And again FreeBSDs approach is simpler to understand than the linux one (I did use deadline as my scheduler while running linux, but that is another matter). I am beginning to understand how FreeBSD calculates interactive threads, vs batch interactive processes. What I am still not fully aware of is what those interactive batch processes are, and to what degree I will have to think about it. Compiling ports for instance is to me not interactive, but the computer might consider differently. So in the following scheme everything above 151, as I see it will, be of lesser importance to desktop users. But for me as an audio editor ??? Correct me if I am wrong

Min	Max	Description
0	PRI_MIN PRI_MIN_ITHD	47	PRI_MAX_ITHD	Interrupt thread real-time priorities. Distinguished values: PI_REALTIME, PI_AV, PI_NET, PI_DISK, PI_TTY, PI_DULL, PI_SOFT, PI_SWI
48	PRI_MIN_REALTIME	79	PRI_MAX_REALTIME	rtprio real-time priorities
80	PRI_MIN_KERN	119	PRI_MAX_KERN	Kernel thread real-time priorities. Distinguished values: PSWP, PVM, PINOD, PRIBIO, PVFS, PZERO, PSOCK, PWAIT, PLOCK, PPAUSE
120	PRI_MIN_TIMESHARE PRI_MIN_INTERACT	151	PRI_MAX_INTERACT	Time-share threads in interactive category. ULE puts threads with these priorities onto a real-time queue. Distinguished value: PUSER (120, PRI_MIN_TIMESHARE)
152	PRI_MIN_BATCH	171		Time-share threads in batch category with nice priority between -1 and -20
172	SCHED_PRI_MIN	203	SCHED_PRI_MAX	Time-share threads in batch category
204		223	PRI_MAX_TIMESHARE PRI_MAX_BATCH	Time-share threads in batch category with nice priority between 1 and 20
224	PRI_MIN_IDLE	255	PRI_MAX_IDLE PRI_MAX	idprio idle threads and the system idle threads

The scheme is from here: https://wiki.freebsd.org/AndriyGapon/AvgThreadPriorityRanges

Deleted member 70435 · Jun 24, 2022

what good, you put this, really I ask you what is the behavior, kern.sched, how does it behave in FreeBSD?. which implementation you will be able to use with it, well that's it.

Erichans · Jun 24, 2022

larshenrikoern said:
[...] What I am still not fully aware of is what those interactive batch processes are, and to what degree I will have to think about it. Compiling ports for instance is to me not interactive, but the computer might consider differently.

"interactive batch processes" sounds like a a contradictio in terminis. Where did you get that term from? A link and a precise quote would be helpful. Perhaps you are referring to a timeshare thread (a "normal" user thread) that started as an interactive one, that means that at that time it is listed on the real-time queue, and over time the thread has landed in the batch queue. For each processor core a thread can be listed in one of three run queues: the real-time queue, the batch queue or the idle queue. The scheduler decides what thread is to run next (=gets CPU time) and if it stays in the same queue.

The decision on whether a thread is considered interactive or not is based on the relation between sleep time and run time. The interactivity score is based on that and calculated accordingly. The ULE article (e.g. #4, just after equation 2):

The scaling factor is the maximum interactivity
score divided by two. Threads that score
below the interactivity threshold are considered
to be interactive; all others are noninteractive.

The earlier referenced part (4.4. Thread Scheduling) of Chapter 4 Process Management in this thread is of the first edition of The Design and Implementation of the FreeBSD Operating System. The same chapter—#6 in the list below—is also freely available for the second edition. Perhaps also helpful:

An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - EuroBSDcon 2021; video & slides
Kirk McKusick briefly discusses the article The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS in this presentation.
An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - BSD Can 2020; video & slides
This is basically the same talk as in #4 but, there are interesting differences*
ULE: A Modern Scheduler For FreeBSD by Jeff Roberson - BSDCon 2003
*** The FreeBSD ULE scheduler by Marshall Kirk McKusick and Jeff Roberson - Science, Systems and FreeBSD, September / October 2014 - The FreeBSD Journal, Vol 1, Issue No. 5, page 20-26**
The FreeBSD ULE scheduler - just the article as mentioned above, from Jeff Roberson's web page.
The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS by Justinien Bouron, Sebastien Chevalley, Baptiste Lepers, and Willy Zwaenepoel - USENIX ATC 2018; pdf - slides - audio of presentation
Chapter 4: Process Management or pdf of The Design and Implementation of the FreeBSD Operating System, 2nd Edition

While both articles #3 & #4 describe the ULE scheduler and #3 might look more of an extensive (academic-like) article, I personally find it less cramped (for the lack of a better word) than the article in the FreeBSD Journal.

___
* The 4BSD scheduler has long been the default scheduler of FreeBSD. As of FreeBSD 8, the new ULE scheduler took its place as the default scheduler. The 4BSD scheduler is still part of FreeBSD and you can use it instead of the ULE scheduler! You might consider using it on a small (embedded) system, like the Raspberry Pi; see the (part of the) Q&A session following the presentation in #1.
** The ULE article is available from Jeff Roberson's web page. However, for unknown reasons on July 16, 2026, I cannot access the complete journal, nor the ULE article of (Science, Systems and FreeBSD - September/October 2014).
*** In the ULE article by Marshall Kirk McKusick and Jeff Roberson there seems to be a / missing in the denominator of Eq. 1.
For Eq. 1, the correct denominator should be:
sleep / run

larshenrikoern · Jun 25, 2022

Erichans said:
"interactive batch processes" sounds like a a contradictio in terminis. Where did you get that term from? A link and a precise quote would be helpful. Perhaps you are referring to a timeshare thread (a "normal" user thread) that started as an interactive one, that means that at that time it is listed on the real-time queue, and over time the thread has landed in the batch queue. For each processor core a thread can be listed in one of three run queues: the real-time queue, the batch queue or the idle queue. The scheduler decides what thread is to run next (=gets CPU time) and if it stays in the same queue.

The decision on whether a thread is considered interactive or not is based on the relation between sleep time and run time. The interactivity score is based on that and calculated accordingly. The ULE article (e.g. #4, just after equation 2):

The earlier referenced part (4.4. Thread Scheduling) of Chapter 4 Process Management in this thread is of the first edition of The Design and Implementation of the FreeBSD Operating System. The same chapter—#6 in the list below—is also freely available for the second edition. Perhaps also helpful:

An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - EuroBSDcon 2021; video & slides
Kirk McKusick shortly discusses the article The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS in this presentation.

An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - BSD Can 2020; video & slides
This is basically the same talk as in #4 but, there are interesting differences*

ULE: A Modern Scheduler For FreeBSD by Jeff Roberson - BSDCon 2003

The FreeBSD ULE scheduler by Marshall Kirk McKusick and Jeff Roberson - Science, Systems and FreeBSD, September / October 2014 - The FreeBSD Journal, Vol 1, Issue No. 5, page 20-26
The FreeBSD ULE scheduler - just the article as mentioned above, from Jeff Roberson's web page.

The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS by Justinien Bouron, Sebastien Chevalley, Baptiste Lepers, and Willy Zwaenepoel - USENIX ATC 2018; pdf - slides - audio of presentation

Chapter 4: Process Management or pdf of The Design and Implementation of the FreeBSD Operating System, 2nd Edition

While both articles #3 & #4 describe the ULE scheduler and #3 might look more of an extensive (academic-like) article, I personally find it less cramped (for the lack of a better word) than the article in the FreeBSD Journal.

___
* The 4BSD scheduler has long been the default scheduler of FreeBSD. As of FreeBSD 8, the new ULE scheduler took its place as the default scheduler. The 4BSD scheduler is still part of FreeBSD and you can use it instead of the ULE scheduler! You might consider using it on a small (embedded) system, like the Raspberry Pi; see the (part of the) Q&A session following the presentation in #I1.

Thank you for a detailed reaction I will look into the few of the sources that i did not already know

I got the term "Time-share threads in batch category from the this link" https://wiki.freebsd.org/AndriyGapon/AvgThreadPriorityRanges as referred above. And yes it sounds confusing. So that is why I am confused. But on my system there is a to me important difference between kern.sched.preempt_thresh=151 and 152. on the one value sound in multi-track recordings and sequencing are shuttering. At 152 not at all.

All this might come down to native English vs. non native (like myself). Other languages does have different terms than Engllsh. And translating those back to English may give confusing results

larshenrikoern · Jun 25, 2022

larshenrikoern said:
Thank you for a detailed reaction I will look into the few of the sources that i did not already know

I got the term "Time-share threads in batch category from the this link" https://wiki.freebsd.org/AndriyGapon/AvgThreadPriorityRanges as referred above. And yes it sounds confusing. So that is why I am confused. But on my system there is a to me important difference between kern.sched.preempt_thresh=151 and 152. on the one value sound in multi-track recordings and sequencing are shuttering. At 152 not at all.

All this might come down to native English vs. non native (like myself). Other languages does have different terms than Engllsh. And translating those back to English may give confusing results

The values in equtation on this page from

Chapter 4: Process Management or pdf of The Design and Implementation of the FreeBSD Operating System, 2nd Edition

might tell what he means.

larshenrikoern · Jun 25, 2022

Erichans said:
"interactive batch processes" sounds like a a contradictio in terminis. Where did you get that term from? A link and a precise quote would be helpful. Perhaps you are referring to a timeshare thread (a "normal" user thread) that started as an interactive one, that means that at that time it is listed on the real-time queue, and over time the thread has landed in the batch queue. For each processor core a thread can be listed in one of three run queues: the real-time queue, the batch queue or the idle queue. The scheduler decides what thread is to run next (=gets CPU time) and if it stays in the same queue.

The decision on whether a thread is considered interactive or not is based on the relation between sleep time and run time. The interactivity score is based on that and calculated accordingly. The ULE article (e.g. #4, just after equation 2):

The earlier referenced part (4.4. Thread Scheduling) of Chapter 4 Process Management in this thread is of the first edition of The Design and Implementation of the FreeBSD Operating System. The same chapter—#6 in the list below—is also freely available for the second edition. Perhaps also helpful:

An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - EuroBSDcon 2021; video & slides
Kirk McKusick shortly discusses the article The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS in this presentation.

An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - BSD Can 2020; video & slides
This is basically the same talk as in #4 but, there are interesting differences*

ULE: A Modern Scheduler For FreeBSD by Jeff Roberson - BSDCon 2003

The FreeBSD ULE scheduler by Marshall Kirk McKusick and Jeff Roberson - Science, Systems and FreeBSD, September / October 2014 - The FreeBSD Journal, Vol 1, Issue No. 5, page 20-26
The FreeBSD ULE scheduler - just the article as mentioned above, from Jeff Roberson's web page.

The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS by Justinien Bouron, Sebastien Chevalley, Baptiste Lepers, and Willy Zwaenepoel - USENIX ATC 2018; pdf - slides - audio of presentation

Chapter 4: Process Management or pdf of The Design and Implementation of the FreeBSD Operating System, 2nd Edition

While both articles #3 & #4 describe the ULE scheduler and #3 might look more of an extensive (academic-like) article, I personally find it less cramped (for the lack of a better word) than the article in the FreeBSD Journal.

___
* The 4BSD scheduler has long been the default scheduler of FreeBSD. As of FreeBSD 8, the new ULE scheduler took its place as the default scheduler. The 4BSD scheduler is still part of FreeBSD and you can use it instead of the ULE scheduler! You might consider using it on a small (embedded) system, like the Raspberry Pi; see the (part of the) Q&A session following the presentation in #1.

I can now see that in the slides to An Overview of Scheduling in the FreeBSD Kernel McKursick also uses these terms. So the term is used by the developers.

PMc · Jun 25, 2022

larshenrikoern said:
Thank you for a detailed reaction I will look into the few of the sources that i did not already know

I got the term "Time-share threads in batch category from the this link"

Ah, yes. I didn't understand that at first, too. Then I figured that "time-share" is the older term for "multitasking". When I started with unix, it was "multitasking/multiuser".
So, all processes on the machine are time-sharing processes, and the discussion is about batch processes (that run in the background and use cpu until they have completed their task) and interactive processes (that wait for user input, react on it, and then wait again).

It is commonly said that an interactive action should not need more than 50ms to react, to give the user a smooth experience.
But this does not concern things like audio/video processing, where 50ms is far too slow. In such cases rtprio may help.

There is another issue that is not considered in the docs: there is a difference between batch processes that do regularly access the disk, and those that do not.
Under certain conditions I observed two otherwise equal processes run with 95% and 5% of cpu usage - one would almost always compute, and the other would not make progress.

I found that unacceptable, so I fixed it. You can try that fix if it makes any difference for your workload (to the better or worse).

Code:

diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
index 1f859b28684..c5ef566c998 100644
--- a/sys/kern/sched_ule.c
+++ b/sys/kern/sched_ule.c
@@ -38,7 +38,7 @@
  */
 
 #include <sys/cdefs.h>
-__FBSDID("$FreeBSD$");
+__FBSDID("$FreeBSD: releng/12.2/sys/kern/sched_ule.c 355610 2019-12-11 15:15:21Z mav $");
 
 #include "opt_hwpmc_hooks.h"
 #include "opt_sched.h"
@@ -224,6 +224,7 @@ static int __read_mostly preempt_thresh = 0;
 static int __read_mostly static_boost = PRI_MIN_BATCH;
 static int __read_mostly sched_idlespins = 10000;
 static int __read_mostly sched_idlespinthresh = -1;
+static int __read_mostly resume_preempted = 1;
 
 /*
  * tdq - per processor runqs and statistics.  All fields are protected by the
@@ -485,7 +486,10 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                 * This queue contains only priorities between MIN and MAX
                 * realtime.  Use the whole queue to represent these values.
                 */
-               if ((flags & (SRQ_BORROWING|SRQ_PREEMPTED)) == 0) {
+               if (((flags & SRQ_PREEMPTED) && resume_preempted) ||
+                               (flags & SRQ_BORROWING))
+                       pri = tdq->tdq_ridx;
+               else {
                        pri = RQ_NQS * (pri - PRI_MIN_BATCH) / PRI_BATCH_RANGE;
                        pri = (pri + tdq->tdq_idx) % RQ_NQS;
                        /*
@@ -496,8 +500,7 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                        if (tdq->tdq_ridx != tdq->tdq_idx &&
                            pri == tdq->tdq_ridx)
                                pri = (unsigned char)(pri - 1) % RQ_NQS;
-               } else
-                       pri = tdq->tdq_ridx;
+               }
                runq_add_pri(ts->ts_runq, td, pri, flags);
                return;
        } else
@@ -3193,6 +3196,9 @@ SYSCTL_UINT(_kern_sched, OID_AUTO, interact, CTLFLAG_RW, &sched_interact, 0,
 SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW,
     &preempt_thresh, 0,
     "Maximal (lowest) priority for preemption");
+SYSCTL_INT(_kern_sched, OID_AUTO, resume_preempted, CTLFLAG_RW,
+    &resume_preempted, 0,
+    "Reinsert preemted threads at queue-head");
 SYSCTL_INT(_kern_sched, OID_AUTO, static_boost, CTLFLAG_RW, &static_boost, 0,
     "Assign static kernel priorities to sleeping threads");
 SYSCTL_INT(_kern_sched, OID_AUTO, idlespins, CTLFLAG_RW, &sched_idlespins, 0,

After installing + starting the new kernel, the fix is disabled by default. To activate it (at any time), set sysctl kern.sched.resume_preempted=0. Switch it off again with sysctl kern.sched.resume_preempted=1 and see if this makes a difference.

mer · Jun 25, 2022

PMc have you tried to push that upstream to a committer?

larshenrikoern · Jun 25, 2022

PMc said:
Ah, yes. I didn't understand that at first, too. Then I figured that "time-share" is the older term for "multitasking". When I started with unix, it was "multitasking/multiuser".
So, all processes on the machine are time-sharing processes, and the discussion is about batch processes (that run in the background and use cpu until they have completed their task) and interactive processes (that wait for user input, react on it, and then wait again).

It is commonly said that an interactive action should not need more than 50ms to react, to give the user a smooth experience.
But this does not concern things like audio/video processing, where 50ms is far too slow. In such cases rtprio may help.

There is another issue that is not considered in the docs: there is a difference between batch processes that do regularly access the disk, and those that do not.
Under certain conditions I observed two otherwise equal processes run with 95% and 5% of cpu usage - one would almost always compute, and the other would not make progress.

I found that unacceptable, so I fixed it. You can try that fix if it makes any difference for your workload (to the better or worse).

Code:

diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c index 1f859b28684..c5ef566c998 100644 --- a/sys/kern/sched_ule.c +++ b/sys/kern/sched_ule.c @@ -38,7 +38,7 @@ */ #include <sys/cdefs.h> -__FBSDID("$FreeBSD$"); +__FBSDID("$FreeBSD: releng/12.2/sys/kern/sched_ule.c 355610 2019-12-11 15:15:21Z mav $"); #include "opt_hwpmc_hooks.h" #include "opt_sched.h" @@ -224,6 +224,7 @@ static int __read_mostly preempt_thresh = 0; static int __read_mostly static_boost = PRI_MIN_BATCH; static int __read_mostly sched_idlespins = 10000; static int __read_mostly sched_idlespinthresh = -1; +static int __read_mostly resume_preempted = 1; /* * tdq - per processor runqs and statistics. All fields are protected by the @@ -485,7 +486,10 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags) * This queue contains only priorities between MIN and MAX * realtime. Use the whole queue to represent these values. */ - if ((flags & (SRQ_BORROWING|SRQ_PREEMPTED)) == 0) { + if (((flags & SRQ_PREEMPTED) && resume_preempted) || + (flags & SRQ_BORROWING)) + pri = tdq->tdq_ridx; + else { pri = RQ_NQS * (pri - PRI_MIN_BATCH) / PRI_BATCH_RANGE; pri = (pri + tdq->tdq_idx) % RQ_NQS; /* @@ -496,8 +500,7 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags) if (tdq->tdq_ridx != tdq->tdq_idx && pri == tdq->tdq_ridx) pri = (unsigned char)(pri - 1) % RQ_NQS; - } else - pri = tdq->tdq_ridx; + } runq_add_pri(ts->ts_runq, td, pri, flags); return; } else @@ -3193,6 +3196,9 @@ SYSCTL_UINT(_kern_sched, OID_AUTO, interact, CTLFLAG_RW, &sched_interact, 0, SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW, &preempt_thresh, 0, "Maximal (lowest) priority for preemption"); +SYSCTL_INT(_kern_sched, OID_AUTO, resume_preempted, CTLFLAG_RW, + &resume_preempted, 0, + "Reinsert preemted threads at queue-head"); SYSCTL_INT(_kern_sched, OID_AUTO, static_boost, CTLFLAG_RW, &static_boost, 0, "Assign static kernel priorities to sleeping threads"); SYSCTL_INT(_kern_sched, OID_AUTO, idlespins, CTLFLAG_RW, &sched_idlespins, 0,

After installing + starting the new kernel, the fix is disabled by default. To activate it (at any time), set sysctl kern.sched.resume_preempted=0. Switch it off again with sysctl kern.sched.resume_preempted=1 and see if this makes a difference.

To me it come down to how are the machine reacting for my workload, and currently it is OK.

About video and audio. Yes FreeBSD is not optimal but woks for my not to big demands with audio. I have had some writing with Hans Peter Selasky (programming on parts af FreeBSD audio system). He might look into a patch like this one, and get some ideas from it, and come up with a practical solution.

PMc · Jun 25, 2022

mer said:
PMc have you tried to push that upstream to a committer?

I don't have a sponsor.

What is sysctl kern.sched.preempt_thresh

larshenrikoern

thindil

larshenrikoern

thindil

larshenrikoern

PMc

sched_ule.c « kern « sys - src - FreeBSD source tree

mer

thindil

sched_ule.c « kern « sys - src - FreeBSD source tree

Alain De Vos

larshenrikoern

larshenrikoern

mer

PMc

larshenrikoern

lanin

larshenrikoern

Deleted member 70435

Guest

Erichans

larshenrikoern

larshenrikoern

Attachments

larshenrikoern

PMc

mer

larshenrikoern

PMc