In order to make sense of these values, you need first to understand how timesharing was originally designed. There was only one core, and a HZ value (the HZ value is the timer interrupt frequency).
When the kernel starts a process, that process gets into control and owns the cpu. There is no way to take away the cpu from the process, until either
- there is an interrupt occuring from somewhere, or
- the process does a system call and thereby gives control back to the kernel.
When an interrupt occurs, the kernel will put the process on hold and service the interrupt. Thats why the timer interrupt is important, because otherwise, in the absence of device i/o, processes could run forever.
Occasionally the scheduler would look at the acitivity patterns of the processes and calculate a priority for each of them (visible in
ps axlH
). This priority is then used to decide on the next process to run, so that smooth interactive processing can go along with compute-intensive tasks.
The mechanism was not so well-suited for multiprocessor systems, and therefore preemption was introduced. Preemption allows the scheduler to interrupt and switch processes much more often, without having to wait for the (rather expensive) timer interrupt, and even while they execute in kernel mode.
Then there are three special cases: the
idprio
and
rtprio
processes and the kernel processes. All of these have a fixed priority. Together, the user processes with their ad-hoc calculated priority, plus these three kinds, form a contiguous scale of priorities from somewhere -99 to +155 (lower number means higher priority), or normalized from 0 to 255 (you never know which one you're currently dealing with), as follows (give or take one or two - if you need the exact numbers, figure them out yourself):
Code:
-100 .. -52 interrupt processing
-51 .. -21 [CMD]rtprio[/CMD] tasks
-20 .. +19 kernel tasks
+20 .. +120 user processes (dynamic assigned
+124 .. +154 [CMD]idprio[/CMD] tasks
+155 the idle process
The
kern.sched.preempt_thresh is then the cutoff value in that sequence beyond with you do not allow preemption (but in the internal 0..255 scale).
So, 224 is basically the starting point of the idprio scale. The code itself has three default values: normally 80, with FULL_PREEMTION 255 (that does afaik not work well), and wihout PREEMPTION 0 (that didn't work at all for me). Somewhere it came out that 162 might be a good value, so I am running with this (but forgot the detailed reason).
Now for the logic (for the non-native inglisch speakers): to preempt somebody means to queue-jump them.
So, if
our own prioritiy number is
smaller than this threshold (meaning we have
more priority), then we are allowed to preempt others.
If this then does any good - well, that depends....
And this is exactly the problem with the scheduler. From what I figured, the ULE scheduler was written because the old one was just unsatisfying for SMP. It was written by a guy who had this great design idea (which it is), and it was then put into the system, it solved the imminent problems, and end of story. It was never really honed to the system, some of the tuneables are rather lab-testing instrumentation, and there are cornercases where it behaves just bad (which are solveable).
I once talked to the author, and when he figured that I really wanted to go into depth of the stuff, he went immediately U-boot. Certainly he has found other interesting things to put his attention to.