Scheduler: Interactive apps too slow under heavy load

Hello,

Simple development situation: I'm compiling something with cargo (rust) on my 16-thread CPU with 100% load. I tend to run cargo as nice cargo <whatever>.
In the meantime, I'd like to browse within Firefox. I know that load is load and one cannot ignore that but Firefox as scheduled under such heavy load feels a bit less interactive than I'd like.

Would greatly appreciate any suggestions. I don't care about compilation being e.g. 10% slower. (13.1-RELEASE, AMD Ryzen CPU, 64 GB RAM, 2 GB swap, not swapping at all)

Here's my current kern.sched dump:
Code:
kern.sched.always_steal: 0
kern.sched.trysteal_limit: 2
kern.sched.steal_thresh: 1
kern.sched.steal_idle: 1
kern.sched.balance_interval: 127
kern.sched.balance: 1
kern.sched.affinity: 1
kern.sched.idlespinthresh: 157
kern.sched.idlespins: 10000
kern.sched.static_boost: 152
kern.sched.preempt_thresh: 224
kern.sched.interact: 30
kern.sched.slice: 12
kern.sched.quantum: 94488
kern.sched.name: ULE
kern.sched.preemption: 1
kern.sched.cpusetsize: 32
 
Check if the machine is swapping.

I have also observed long pauses with things like git pull out make installworld. There may be a bug but I haven’t investigated.
 
Don't use all the cores/threads when compiling? So there's a couple left for other applications?

My VM host has 8 cores/16 threads and I typically run a buildworld with -j8 so there's still a couple of cores/threads left for the VMs to run on.
 
Thanks SirDice, but still. Would that be any trick how to achieve that better interactivity even under -j16 ? I mean, something regarding priorities, interactivity thresholds etc. I'm of course not interested about this naive example with Firefox, but in general. All I put there now is nice "prefix".
 
You might be bottlenecking on your I/O, nice(1) only looks at process scheduling. But if your I/O is 100% saturated you're still going to have a slow responding application.
 
There's also idprio(1), putting the task(s) in a whole other priority class. This worked pretty fine for me, running poudriere using ALL cores/threads and still making sure ANY other task in the system has priority getting CPU time.

But yes, as SirDice pointed out, it won't help once I/O becomes the bottleneck. Typically happens here when ccache has a very good hit-rate, shifting the workload of building from CPU-bound to IO-bound.
 
IO doesn't seem to be issue particularly on my machine, or is it? This is iostat with 1 second interval during the compilation. I guess it's more CPU/scheduler related.

Code:
       tty            nvd0             cpu
 tin  tout KB/t  tps  MB/s  us ni sy in id
   4   628 30.8  125   3.8  30  0  1  0 68
   1   129  0.0    0   0.0   0  0  0  0 100
  17   566  0.0    0   0.0   1  0  0  0 99
  10  2708  6.6  108   0.7  37  0  7  0 56
   0  4651  2.0    4   0.0  87  0 12  0  0
   0  1590  0.0    0   0.0  94  0  6  0  0
   0  3273  0.0    0   0.0  92  0  8  0  0
   0  2220  0.0    0   0.0  93  0  7  0  0
   0  2321 36.0 2016  70.9  91  0  9  0  0
   0  2096  0.0    0   0.0  95  0  5  0  0
   0  2230  0.0    0   0.0  93  0  7  0  0
   0  1737  0.0    0   0.0  94  0  6  0  0
   0  1131  0.0    0   0.0  96  0  3  0  0
   0  2276 34.1 1559  51.9  92  0  8  0  0
   0  1745  0.0    0   0.0  96  0  4  0  0
   0  1381  3.3    6   0.0  96  0  4  0  0
   0  2687  0.0    0   0.0  92  0  8  0  0
   0  2402  0.0    0   0.0  92  0  8  0  0
   0   672 35.8 1795  62.8  93  0  7  0  0
   0   540  0.0    0   0.0  97  0  3  0  0
   0   218  0.0    0   0.0  99  0  1  0  0
   1  1443  0.0    0   0.0  97  0  3  0  0
   0  1752  0.0    0   0.0  94  0  6  0  0
   0  1873 41.3 1310  52.9  92  0  8  0  0
 
IO doesn't seem to be issue particularly on my machine, or is it? This is iostat with 1 second interval during the compilation. I guess it's more CPU/scheduler related.

Code:
       tty            nvd0             cpu
 tin  tout KB/t  tps  MB/s  us ni sy in id
   4   628 30.8  125   3.8  30  0  1  0 68
   1   129  0.0    0   0.0   0  0  0  0 100
  17   566  0.0    0   0.0   1  0  0  0 99
  10  2708  6.6  108   0.7  37  0  7  0 56
   0  4651  2.0    4   0.0  87  0 12  0  0
   0  1590  0.0    0   0.0  94  0  6  0  0
   0  3273  0.0    0   0.0  92  0  8  0  0
   0  2220  0.0    0   0.0  93  0  7  0  0
   0  2321 36.0 2016  70.9  91  0  9  0  0
   0  2096  0.0    0   0.0  95  0  5  0  0
   0  2230  0.0    0   0.0  93  0  7  0  0
   0  1737  0.0    0   0.0  94  0  6  0  0
   0  1131  0.0    0   0.0  96  0  3  0  0
   0  2276 34.1 1559  51.9  92  0  8  0  0
   0  1745  0.0    0   0.0  96  0  4  0  0
   0  1381  3.3    6   0.0  96  0  4  0  0
   0  2687  0.0    0   0.0  92  0  8  0  0
   0  2402  0.0    0   0.0  92  0  8  0  0
   0   672 35.8 1795  62.8  93  0  7  0  0
   0   540  0.0    0   0.0  97  0  3  0  0
   0   218  0.0    0   0.0  99  0  1  0  0
   1  1443  0.0    0   0.0  97  0  3  0  0
   0  1752  0.0    0   0.0  94  0  6  0  0
   0  1873 41.3 1310  52.9  92  0  8  0  0

Maybe, but I have seen profile slowdown like that.

You can easily test this theory by copying the profile to a memory fs temporarily and see whether things improve.
 
Back
Top