What is sysctl kern.sched.preempt_thresh

To sum up the first day of discussions in this thread.

The value suggested in most guides 223 is to high. My suspicion is that it is a value suggested quite early in the development of the ULE scheduler. On my system lowering the value has actually lowered the cpu temperature quite a bit. So using the resources more efficient, and reducing cpu load (and thereby improving interactivity more than a higher value). We are currently considering the following values for a interactive desktop system:

kern.sched.preempt_thresh=151 Time-share threads in interactive category

kern.sched.preempt_thresh=171 Time-share threads in batch category with nice priority between -1 and -20

kern.sched.preempt_thresh=162 as a middle ground between the two

On servers these values is not recommended. Mostly for desktops.

I might suggest that one or more of these values will be placed in the handbook as a part of tuning with sysctl.conf

Comments much appreciated.
Have a nice day
 
To sum up the first day of discussions in this thread.

The value suggested in most guides 223 is to high. My suspicion is that it is a value suggested quite early in the development of the ULE scheduler. On my system lowering the value has actually lowered the cpu temperature quite a bit. So using the resources more efficient, and reducing cpu load (and thereby improving interactivity more than a higher value). We are currently considering the following values for a interactive desktop system:

kern.sched.preempt_thresh=151 Time-share threads in interactive category

kern.sched.preempt_thresh=171 Time-share threads in batch category with nice priority between -1 and -20

kern.sched.preempt_thresh=162 as a middle ground between the two

On servers these values is not recommended. Mostly for desktops.

I might suggest that one or more of these values will be placed in the handbook as a part of tuning with sysctl.conf

Comments much appreciated.
Have a nice day
While I agree that value 162 looks as the best recommendation for the typical desktop, I would wait with calling 223 too high.
I have done some tests with values above 255, if I understand correctly, it should lead to almost disable preemption. The daily desktop tasks didn't get any benefit from it, but extremely heavy tasks, like playing modern AAA Windows games with almost constant full usage of the resources, I had some better interactivity. But to be honest, I need more tests about it. :)
 
While I agree that value 162 looks as the best recommendation for the typical desktop, I would wait with calling 223 too high.
I have done some tests with values above 255, if I understand correctly, it should lead to almost disable preemption. The daily desktop tasks didn't get any benefit from it, but extremely heavy tasks, like playing modern AAA Windows games with almost constant full usage of the resources, I had some better interactivity. But to be honest, I need more tests about it. :)
Values over 256 has no effect. The max value is 256. To raise one single programs priority over the others use rtprio
 
Values over 256 has no effect. The max value is 256. To raise one single programs priority over the others use rtprio
Yes, now I know that they don't have effect. But what happens when you enter too high value? It is reduced to 0 or the max allowed value used?
 
While I agree that value 162 looks as the best recommendation for the typical desktop, I would wait with calling 223 too high.
I have done some tests with values above 255, if I understand correctly, it should lead to almost disable preemption. The daily desktop tasks didn't get any benefit from it, but extremely heavy tasks, like playing modern AAA Windows games with almost constant full usage of the resources, I had some better interactivity. But to be honest, I need more tests about it. :)
What we are doing when moving around with these values are moving the bottlenecks on a system. If you have sufficient memory the cpu, diskd, or gpu might be your limiting factor. So the optimal optimizations will be a different from system to system and after its usage. If you are using just one program that might be the one thing that matters.

To me interactivity means that the system is made not to do one single task, but a lot of different tasks after my need. And make sure the system responds quickly and efficient. My current value of kern.sched.preempt_thresh=151 is doing this perfectly right now on machine. Of course I have other optimizations as well, and right now they are playing together very well.
 
Yes, now I know that they don't have effect. But what happens when you enter too high value? It is reduced to 0 or the max allowed value used?
Being at a point where you know what to look for and what it is about, I believe you can now start reading the source. ;) I't not so difficult:

Code:
$ find /usr/src/sys -type f | xargs grep -l preempt_thresh
/usr/src/sys/kern/sched_ule.c
/usr/src/sys/amd64/compile/C6R12V1/kernel.full
/usr/src/sys/amd64/compile/C6R12V1/sched_ule.o
/usr/src/sys/amd64/compile/C6R12V1/kernel.debug

$ grep preempt_thresh /usr/src/sys/kern/sched_ule.c
 * preempt_thresh:      Priority threshold for preemption and remote IPIs.
static int __read_mostly preempt_thresh = PRI_MAX_IDLE;
static int __read_mostly preempt_thresh = PRI_MIN_KERN;
static int __read_mostly preempt_thresh = 0;
        if (preempt_thresh == 0)
        if (pri <= preempt_thresh)
SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW,
    &preempt_thresh, 0,

Besides variable initialization, there isn't much there. Have a look at it:

That isn't too difficult to read, or is it?
 
Being at a point where you know what to look for and what it is about, I believe you can now start reading the source. ;) I't not so difficult:

Code:
$ find /usr/src/sys -type f | xargs grep -l preempt_thresh
/usr/src/sys/kern/sched_ule.c
/usr/src/sys/amd64/compile/C6R12V1/kernel.full
/usr/src/sys/amd64/compile/C6R12V1/sched_ule.o
/usr/src/sys/amd64/compile/C6R12V1/kernel.debug

$ grep preempt_thresh /usr/src/sys/kern/sched_ule.c
 * preempt_thresh:      Priority threshold for preemption and remote IPIs.
static int __read_mostly preempt_thresh = PRI_MAX_IDLE;
static int __read_mostly preempt_thresh = PRI_MIN_KERN;
static int __read_mostly preempt_thresh = 0;
        if (preempt_thresh == 0)
        if (pri <= preempt_thresh)
SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW,
    &preempt_thresh, 0,

Besides variable initialization, there isn't much there. Have a look at it:

That isn't too difficult to read, or is it?
Isn't too difficult, thank you for pointing me there. :)
 
Hi

I am still a bit unsure what some of the terms means for desktop use. For instance I am a bit unsure about "Time-share threads in batch category". Some practical examples would be nice. Because I want to put the number as low as possible, to spare system resources.

I do know about the kernel threads and in the other end idle threads. And giving the latter realtime priority will most of the time be silly. But the ones in between. Where to put the line for kern.sched.preempt_thresh ?? Lets argue a bit over that, just to making everyone a bit more knowledgeable :)
120
151
152
162
171
172
203
204
223
 
batch category, maybe compilation? Like doing a build world, would those threads wind up in batch?
The part about "nice priority" does that imply the process/thread was subject to the "nice" command?
Don't threads start off with a default priority? Running top looking at things running under my username
they are all at the same priority (minor changes as they are used) A user can nice something to lower the priority (yes root can increase the priority), but otherwise when the thread is created the priority is assigned.

I could be wrong about how a thread gets it's default priority, but that's my understanding.

EDIT:
It been reported/stated that priorities in top are "value from the kernel - 100" so most of the threads owned by my user are running at 20: typical desktop use, firefox, terminal windows, ssh sessions. firefox has a few different threads running (even with one tap open), sometimes the priority for a firefox thread goes to 21-27. This implies my user threads run at about "120" normally, maybe get to 127. That winds up crossing categories a little. I'm not nicing anything so I'd guess these are default priority for user processes.
 
Just run an endless loop and watch the priority change:
Code:
$ while true
do
:
done &
$ ps axl | head -1 ; ps axl | grep sh
  UID  PID PPID  C PRI NI    VSZ    RSS MWCHAN   STAT TT        TIME COMMAND
    2 4981 3650  0  20  0  12176   3244 wait     S     3     0:00.01 sh
    2 4987 4981  2  97  0  12176   3244 -        R     3     0:17.58 sh
 
  • Like
Reactions: mer
Just run an endless loop and watch the priority change:
Code:
$ while true
do
:
done &
$ ps axl | head -1 ; ps axl | grep sh
  UID  PID PPID  C PRI NI    VSZ    RSS MWCHAN   STAT TT        TIME COMMAND
    2 4981 3650  0  20  0  12176   3244 wait     S     3     0:00.01 sh
    2 4987 4981  2  97  0  12176   3244 -        R     3     0:17.58 sh
I do not understand why I should try doing that. And it stops at 103 when running as normal user. So what does it prove ??
 
Looks quite interesting. And again FreeBSDs approach is simpler to understand than the linux one (I did use deadline as my scheduler while running linux, but that is another matter). I am beginning to understand how FreeBSD calculates interactive threads, vs batch interactive processes. What I am still not fully aware of is what those interactive batch processes are, and to what degree I will have to think about it. Compiling ports for instance is to me not interactive, but the computer might consider differently. So in the following scheme everything above 151, as I see it will, be of lesser importance to desktop users. But for me as an audio editor ??? Correct me if I am wrong :)


MinMaxDescription
0PRI_MIN
PRI_MIN_ITHD
47PRI_MAX_ITHDInterrupt thread real-time priorities.
Distinguished values: PI_REALTIME, PI_AV, PI_NET, PI_DISK, PI_TTY, PI_DULL, PI_SOFT, PI_SWI
48PRI_MIN_REALTIME79PRI_MAX_REALTIMErtprio real-time priorities
80PRI_MIN_KERN119PRI_MAX_KERNKernel thread real-time priorities.
Distinguished values: PSWP, PVM, PINOD, PRIBIO, PVFS, PZERO, PSOCK, PWAIT, PLOCK, PPAUSE
120PRI_MIN_TIMESHARE
PRI_MIN_INTERACT
151PRI_MAX_INTERACTTime-share threads in interactive category.
ULE puts threads with these priorities onto a real-time queue.
Distinguished value: PUSER (120, PRI_MIN_TIMESHARE)
152PRI_MIN_BATCH171Time-share threads in batch category with nice priority between -1 and -20
172SCHED_PRI_MIN203SCHED_PRI_MAXTime-share threads in batch category
204223PRI_MAX_TIMESHARE
PRI_MAX_BATCH
Time-share threads in batch category with nice priority between 1 and 20
224PRI_MIN_IDLE255PRI_MAX_IDLE
PRI_MAX
idprio idle threads and the system idle threads

The scheme is from here: https://wiki.freebsd.org/AndriyGapon/AvgThreadPriorityRanges
 
what good, you put this, really I ask you what is the behavior, kern.sched, how does it behave in FreeBSD?. which implementation you will be able to use with it, well that's it.
 
[...] What I am still not fully aware of is what those interactive batch processes are, and to what degree I will have to think about it. Compiling ports for instance is to me not interactive, but the computer might consider differently.
"interactive batch processes" sounds like a a contradictio in terminis. Where did you get that term from? A link and a precise quote would be helpful. Perhaps you are referring to a timeshare thread (a "normal" user thread) that started as an interactive one, that means that at that time it is listed on the real-time queue, and over time the thread has landed in the batch queue. For each processor core a thread can be listed in one of three run queues: the real-time queue, the batch queue or the idle queue. The scheduler decides what thread is to run next (=gets CPU time) and if it stays in the same queue.

The decision on whether a thread is considered interactive or not is based on the relation between sleep time and run time. The interactivity score is based on that and calculated accordingly. The ULE article (e.g. #4, just after equation 2):
The scaling factor is the maximum interactivity
score divided by two. Threads that score
below the interactivity threshold are considered
to be interactive; all others are noninteractive.

The earlier referenced part (4.4. Thread Scheduling) of Chapter 4 Process Management in this thread is of the first edition of The Design and Implementation of the FreeBSD Operating System. The same chapter—#6 in the list below—is also freely available for the second edition. Perhaps also helpful:
  1. An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - EuroBSDcon 2021; video & slides
    Kirk McKusick briefly discusses the article The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS in this presentation.
  2. An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - BSD Can 2020; video & slides
    This is basically the same talk as in #4 but, there are interesting differences*
  3. ULE: A Modern Scheduler For FreeBSD by Jeff Roberson - BSDCon 2003
  4. The FreeBSD ULE scheduler by Marshall Kirk McKusick and Jeff Roberson - Science, Systems and FreeBSD, September / October 2014 - The FreeBSD Journal, Vol 1, Issue No. 5, page 20-26
    The FreeBSD ULE scheduler - just the article as mentioned above, from Jeff Roberson's web page.
  5. The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS by Justinien Bouron, Sebastien Chevalley, Baptiste Lepers, and Willy Zwaenepoel - USENIX ATC 2018; pdf - slides - audio of presentation
  6. Chapter 4: Process Management or pdf of The Design and Implementation of the FreeBSD Operating System, 2nd Edition
While both articles #3 & #4 describe the ULE scheduler and #3 might look more of an extensive (academic-like) article, I personally find it less cramped (for the lack of a better word) than the article in the FreeBSD Journal.

___
* The 4BSD scheduler has long been the default scheduler of FreeBSD. As of FreeBSD 8, the new ULE scheduler took its place as the default scheduler. The 4BSD scheduler is still part of FreeBSD and you can use it instead of the ULE scheduler! You might consider using it on a small (embedded) system, like the Raspberry Pi; see the (part of the) Q&A session following the presentation in #1.
 
Last edited:
"interactive batch processes" sounds like a a contradictio in terminis. Where did you get that term from? A link and a precise quote would be helpful. Perhaps you are referring to a timeshare thread (a "normal" user thread) that started as an interactive one, that means that at that time it is listed on the real-time queue, and over time the thread has landed in the batch queue. For each processor core a thread can be listed in one of three run queues: the real-time queue, the batch queue or the idle queue. The scheduler decides what thread is to run next (=gets CPU time) and if it stays in the same queue.

The decision on whether a thread is considered interactive or not is based on the relation between sleep time and run time. The interactivity score is based on that and calculated accordingly. The ULE article (e.g. #4, just after equation 2):


The earlier referenced part (4.4. Thread Scheduling) of Chapter 4 Process Management in this thread is of the first edition of The Design and Implementation of the FreeBSD Operating System. The same chapter—#6 in the list below—is also freely available for the second edition. Perhaps also helpful:
  1. An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - EuroBSDcon 2021; video & slides
    Kirk McKusick shortly discusses the article The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS in this presentation.
  2. An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - BSD Can 2020; video & slides
    This is basically the same talk as in #4 but, there are interesting differences*
  3. ULE: A Modern Scheduler For FreeBSD by Jeff Roberson - BSDCon 2003
  4. The FreeBSD ULE scheduler by Marshall Kirk McKusick and Jeff Roberson - Science, Systems and FreeBSD, September / October 2014 - The FreeBSD Journal, Vol 1, Issue No. 5, page 20-26
    The FreeBSD ULE scheduler - just the article as mentioned above, from Jeff Roberson's web page.
  5. The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS by Justinien Bouron, Sebastien Chevalley, Baptiste Lepers, and Willy Zwaenepoel - USENIX ATC 2018; pdf - slides - audio of presentation
  6. Chapter 4: Process Management or pdf of The Design and Implementation of the FreeBSD Operating System, 2nd Edition
While both articles #3 & #4 describe the ULE scheduler and #3 might look more of an extensive (academic-like) article, I personally find it less cramped (for the lack of a better word) than the article in the FreeBSD Journal.

___
* The 4BSD scheduler has long been the default scheduler of FreeBSD. As of FreeBSD 8, the new ULE scheduler took its place as the default scheduler. The 4BSD scheduler is still part of FreeBSD and you can use it instead of the ULE scheduler! You might consider using it on a small (embedded) system, like the Raspberry Pi; see the (part of the) Q&A session following the presentation in #I1.
Thank you for a detailed reaction I will look into the few of the sources that i did not already know

I got the term "Time-share threads in batch category from the this link" https://wiki.freebsd.org/AndriyGapon/AvgThreadPriorityRanges as referred above. And yes it sounds confusing. So that is why I am confused. But on my system there is a to me important difference between kern.sched.preempt_thresh=151 and 152. on the one value sound in multi-track recordings and sequencing are shuttering. At 152 not at all.

All this might come down to native English vs. non native (like myself). Other languages does have different terms than Engllsh. And translating those back to English may give confusing results
 
Thank you for a detailed reaction I will look into the few of the sources that i did not already know

I got the term "Time-share threads in batch category from the this link" https://wiki.freebsd.org/AndriyGapon/AvgThreadPriorityRanges as referred above. And yes it sounds confusing. So that is why I am confused. But on my system there is a to me important difference between kern.sched.preempt_thresh=151 and 152. on the one value sound in multi-track recordings and sequencing are shuttering. At 152 not at all.

All this might come down to native English vs. non native (like myself). Other languages does have different terms than Engllsh. And translating those back to English may give confusing results
The values in equtation on this page from
might tell what he means.
 

Attachments

  • The Design and Implementation of the FreeBSD® Operating System - 9780321968975.pdf
    866.2 KB · Views: 58
"interactive batch processes" sounds like a a contradictio in terminis. Where did you get that term from? A link and a precise quote would be helpful. Perhaps you are referring to a timeshare thread (a "normal" user thread) that started as an interactive one, that means that at that time it is listed on the real-time queue, and over time the thread has landed in the batch queue. For each processor core a thread can be listed in one of three run queues: the real-time queue, the batch queue or the idle queue. The scheduler decides what thread is to run next (=gets CPU time) and if it stays in the same queue.

The decision on whether a thread is considered interactive or not is based on the relation between sleep time and run time. The interactivity score is based on that and calculated accordingly. The ULE article (e.g. #4, just after equation 2):


The earlier referenced part (4.4. Thread Scheduling) of Chapter 4 Process Management in this thread is of the first edition of The Design and Implementation of the FreeBSD Operating System. The same chapter—#6 in the list below—is also freely available for the second edition. Perhaps also helpful:
  1. An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - EuroBSDcon 2021; video & slides
    Kirk McKusick shortly discusses the article The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS in this presentation.
  2. An Overview of Scheduling in the FreeBSD Kernel by Marshall Kirk McKusick - BSD Can 2020; video & slides
    This is basically the same talk as in #4 but, there are interesting differences*
  3. ULE: A Modern Scheduler For FreeBSD by Jeff Roberson - BSDCon 2003
  4. The FreeBSD ULE scheduler by Marshall Kirk McKusick and Jeff Roberson - Science, Systems and FreeBSD, September / October 2014 - The FreeBSD Journal, Vol 1, Issue No. 5, page 20-26
    The FreeBSD ULE scheduler - just the article as mentioned above, from Jeff Roberson's web page.
  5. The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS by Justinien Bouron, Sebastien Chevalley, Baptiste Lepers, and Willy Zwaenepoel - USENIX ATC 2018; pdf - slides - audio of presentation
  6. Chapter 4: Process Management or pdf of The Design and Implementation of the FreeBSD Operating System, 2nd Edition
While both articles #3 & #4 describe the ULE scheduler and #3 might look more of an extensive (academic-like) article, I personally find it less cramped (for the lack of a better word) than the article in the FreeBSD Journal.

___
* The 4BSD scheduler has long been the default scheduler of FreeBSD. As of FreeBSD 8, the new ULE scheduler took its place as the default scheduler. The 4BSD scheduler is still part of FreeBSD and you can use it instead of the ULE scheduler! You might consider using it on a small (embedded) system, like the Raspberry Pi; see the (part of the) Q&A session following the presentation in #1.
I can now see that in the slides to An Overview of Scheduling in the FreeBSD Kernel McKursick also uses these terms. So the term is used by the developers.
 
Thank you for a detailed reaction I will look into the few of the sources that i did not already know

I got the term "Time-share threads in batch category from the this link"
Ah, yes. I didn't understand that at first, too. Then I figured that "time-share" is the older term for "multitasking". When I started with unix, it was "multitasking/multiuser".
So, all processes on the machine are time-sharing processes, and the discussion is about batch processes (that run in the background and use cpu until they have completed their task) and interactive processes (that wait for user input, react on it, and then wait again).

It is commonly said that an interactive action should not need more than 50ms to react, to give the user a smooth experience.
But this does not concern things like audio/video processing, where 50ms is far too slow. In such cases rtprio may help.

There is another issue that is not considered in the docs: there is a difference between batch processes that do regularly access the disk, and those that do not.
Under certain conditions I observed two otherwise equal processes run with 95% and 5% of cpu usage - one would almost always compute, and the other would not make progress.

I found that unacceptable, so I fixed it. You can try that fix if it makes any difference for your workload (to the better or worse).

Code:
diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
index 1f859b28684..c5ef566c998 100644
--- a/sys/kern/sched_ule.c
+++ b/sys/kern/sched_ule.c
@@ -38,7 +38,7 @@
  */
 
 #include <sys/cdefs.h>
-__FBSDID("$FreeBSD$");
+__FBSDID("$FreeBSD: releng/12.2/sys/kern/sched_ule.c 355610 2019-12-11 15:15:21Z mav $");
 
 #include "opt_hwpmc_hooks.h"
 #include "opt_sched.h"
@@ -224,6 +224,7 @@ static int __read_mostly preempt_thresh = 0;
 static int __read_mostly static_boost = PRI_MIN_BATCH;
 static int __read_mostly sched_idlespins = 10000;
 static int __read_mostly sched_idlespinthresh = -1;
+static int __read_mostly resume_preempted = 1;
 
 /*
  * tdq - per processor runqs and statistics.  All fields are protected by the
@@ -485,7 +486,10 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                 * This queue contains only priorities between MIN and MAX
                 * realtime.  Use the whole queue to represent these values.
                 */
-               if ((flags & (SRQ_BORROWING|SRQ_PREEMPTED)) == 0) {
+               if (((flags & SRQ_PREEMPTED) && resume_preempted) ||
+                               (flags & SRQ_BORROWING))
+                       pri = tdq->tdq_ridx;
+               else {
                        pri = RQ_NQS * (pri - PRI_MIN_BATCH) / PRI_BATCH_RANGE;
                        pri = (pri + tdq->tdq_idx) % RQ_NQS;
                        /*
@@ -496,8 +500,7 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                        if (tdq->tdq_ridx != tdq->tdq_idx &&
                            pri == tdq->tdq_ridx)
                                pri = (unsigned char)(pri - 1) % RQ_NQS;
-               } else
-                       pri = tdq->tdq_ridx;
+               }
                runq_add_pri(ts->ts_runq, td, pri, flags);
                return;
        } else
@@ -3193,6 +3196,9 @@ SYSCTL_UINT(_kern_sched, OID_AUTO, interact, CTLFLAG_RW, &sched_interact, 0,
 SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW,
     &preempt_thresh, 0,
     "Maximal (lowest) priority for preemption");
+SYSCTL_INT(_kern_sched, OID_AUTO, resume_preempted, CTLFLAG_RW,
+    &resume_preempted, 0,
+    "Reinsert preemted threads at queue-head");
 SYSCTL_INT(_kern_sched, OID_AUTO, static_boost, CTLFLAG_RW, &static_boost, 0,
     "Assign static kernel priorities to sleeping threads");
 SYSCTL_INT(_kern_sched, OID_AUTO, idlespins, CTLFLAG_RW, &sched_idlespins, 0,

After installing + starting the new kernel, the fix is disabled by default. To activate it (at any time), set sysctl kern.sched.resume_preempted=0. Switch it off again with sysctl kern.sched.resume_preempted=1 and see if this makes a difference.
 
Ah, yes. I didn't understand that at first, too. Then I figured that "time-share" is the older term for "multitasking". When I started with unix, it was "multitasking/multiuser".
So, all processes on the machine are time-sharing processes, and the discussion is about batch processes (that run in the background and use cpu until they have completed their task) and interactive processes (that wait for user input, react on it, and then wait again).

It is commonly said that an interactive action should not need more than 50ms to react, to give the user a smooth experience.
But this does not concern things like audio/video processing, where 50ms is far too slow. In such cases rtprio may help.

There is another issue that is not considered in the docs: there is a difference between batch processes that do regularly access the disk, and those that do not.
Under certain conditions I observed two otherwise equal processes run with 95% and 5% of cpu usage - one would almost always compute, and the other would not make progress.

I found that unacceptable, so I fixed it. You can try that fix if it makes any difference for your workload (to the better or worse).

Code:
diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
index 1f859b28684..c5ef566c998 100644
--- a/sys/kern/sched_ule.c
+++ b/sys/kern/sched_ule.c
@@ -38,7 +38,7 @@
  */
 
 #include <sys/cdefs.h>
-__FBSDID("$FreeBSD$");
+__FBSDID("$FreeBSD: releng/12.2/sys/kern/sched_ule.c 355610 2019-12-11 15:15:21Z mav $");
 
 #include "opt_hwpmc_hooks.h"
 #include "opt_sched.h"
@@ -224,6 +224,7 @@ static int __read_mostly preempt_thresh = 0;
 static int __read_mostly static_boost = PRI_MIN_BATCH;
 static int __read_mostly sched_idlespins = 10000;
 static int __read_mostly sched_idlespinthresh = -1;
+static int __read_mostly resume_preempted = 1;
 
 /*
  * tdq - per processor runqs and statistics.  All fields are protected by the
@@ -485,7 +486,10 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                 * This queue contains only priorities between MIN and MAX
                 * realtime.  Use the whole queue to represent these values.
                 */
-               if ((flags & (SRQ_BORROWING|SRQ_PREEMPTED)) == 0) {
+               if (((flags & SRQ_PREEMPTED) && resume_preempted) ||
+                               (flags & SRQ_BORROWING))
+                       pri = tdq->tdq_ridx;
+               else {
                        pri = RQ_NQS * (pri - PRI_MIN_BATCH) / PRI_BATCH_RANGE;
                        pri = (pri + tdq->tdq_idx) % RQ_NQS;
                        /*
@@ -496,8 +500,7 @@ tdq_runq_add(struct tdq *tdq, struct thread *td, int flags)
                        if (tdq->tdq_ridx != tdq->tdq_idx &&
                            pri == tdq->tdq_ridx)
                                pri = (unsigned char)(pri - 1) % RQ_NQS;
-               } else
-                       pri = tdq->tdq_ridx;
+               }
                runq_add_pri(ts->ts_runq, td, pri, flags);
                return;
        } else
@@ -3193,6 +3196,9 @@ SYSCTL_UINT(_kern_sched, OID_AUTO, interact, CTLFLAG_RW, &sched_interact, 0,
 SYSCTL_INT(_kern_sched, OID_AUTO, preempt_thresh, CTLFLAG_RW,
     &preempt_thresh, 0,
     "Maximal (lowest) priority for preemption");
+SYSCTL_INT(_kern_sched, OID_AUTO, resume_preempted, CTLFLAG_RW,
+    &resume_preempted, 0,
+    "Reinsert preemted threads at queue-head");
 SYSCTL_INT(_kern_sched, OID_AUTO, static_boost, CTLFLAG_RW, &static_boost, 0,
     "Assign static kernel priorities to sleeping threads");
 SYSCTL_INT(_kern_sched, OID_AUTO, idlespins, CTLFLAG_RW, &sched_idlespins, 0,

After installing + starting the new kernel, the fix is disabled by default. To activate it (at any time), set sysctl kern.sched.resume_preempted=0. Switch it off again with sysctl kern.sched.resume_preempted=1 and see if this makes a difference.
To me it come down to how are the machine reacting for my workload, and currently it is OK.

About video and audio. Yes FreeBSD is not optimal but woks for my not to big demands with audio. I have had some writing with Hans Peter Selasky (programming on parts af FreeBSD audio system). He might look into a patch like this one, and get some ideas from it, and come up with a practical solution.
 
Back
Top