Load average question

Probably a fairly common question, but how is load average calculated ? I have a dual Xeon Silver 4214R ( 12 cores each ) with HT enabled.
It has a 12Gbps backplane connected to 6 Intel SSDs in 3 x zfs mirrors.

This server acts as a nfs repository and I'm trying to gauge how busy it is. I'm a bit confused because the load sometimes shoots up to 3-4 but top still shows idle 95%+
gstat is also always green

Code:
last pid: 13026;  load averages:  1.10,  1.02,  1.05                                                           up 2+13:28:11  09:45:08
68 threads:    2 running, 66 sleeping
CPU:  0.0% user,  0.0% nice,  3.3% system,  0.1% interrupt, 96.6% idle
Mem: 5012K Active, 169M Inact, 244K Laundry, 81G Wired, 43G Free
ARC: 61G Total, 25G MFU, 26G MRU, 24M Anon, 778M Header, 9101M Other
     38G Compressed, 111G Uncompressed, 2.93:1 Ratio
Swap: 12G Total, 12G Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 2583 root         21    0    12M  2828K rpcsvc  46  11:41   6.70% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  24   1:07   6.58% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  29  11:04   5.85% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  28  46:50   5.02% nfsd{nfsd: master}
 2583 root         22    0    12M  2828K rpcsvc  44  22:14   3.71% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  23   6:41   3.64% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  30  22:55   3.55% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  38  22:22   3.51% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K CPU38   38  26:59   3.09% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  44   9:23   2.90% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc   7   1:08   2.84% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc   9  23:21   2.70% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  37   9:15   2.55% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  15  20:30   2.52% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  13  18:21   2.23% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  43  22:16   1.60% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  33  10:40   0.44% nfsd{nfsd: service}
11529 root         20    0    20M  6956K CPU33   33   0:57   0.14% top
 2569 root         20    0    75M    17M select  29   0:11   0.01% mountd
 2543 ntpd         20    0    21M  5520K select  40   0:07   0.01% ntpd{ntpd}
10607 root         20    0    21M    10M select  22   0:03   0.00% sshd
 2583 root         21    0    12M  2828K rpcsvc  35  24:57   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc   1  24:54   0.00% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  11  24:42   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  41  23:10   0.00% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc   3  23:01   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  14  22:27   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  43  20:39   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  47  20:28   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  17  20:16   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  33  20:14   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  26  18:15   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  17  16:39   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  23  14:22   0.00% nfsd{nfsd: service}

Another top sample, load even higher, I see no difference

Code:
last pid: 13054;  load averages:  1.73,  1.29,  1.09                                                           up 2+13:43:25  10:00:22
68 threads:    2 running, 66 sleeping
CPU:  0.0% user,  0.0% nice,  1.3% system,  0.0% interrupt, 98.6% idle
Mem: 5740K Active, 168M Inact, 244K Laundry, 81G Wired, 43G Free
ARC: 61G Total, 25G MFU, 26G MRU, 101M Anon, 781M Header, 9252M Other
     38G Compressed, 111G Uncompressed, 2.93:1 Ratio
Swap: 12G Total, 12G Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 2583 root         21    0    12M  2828K rpcsvc  40  22:31   6.11% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc   6  23:15   5.48% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K CPU18   18  18:37   4.74% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  27  47:17   3.91% nfsd{nfsd: master}
 2583 root         22    0    12M  2828K rpcsvc   1  14:27   3.90% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  29   9:46   3.88% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  43   6:54   3.27% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  26  20:32   2.94% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  37  27:14   2.91% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  24  23:45   2.56% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  31  10:53   2.44% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  34  11:17   2.40% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc   0  18:41   2.18% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  33   1:28   2.13% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  40  16:57   1.56% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc   6  23:33   1.09% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  15  13:09   0.64% nfsd{nfsd: service}
13054 root         20    0    26M  6784K CPU6     6   0:00   0.13% top
 2583 root         22    0    12M  2828K rpcsvc   8   3:04   0.12% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  16  24:55   0.11% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc   0   2:33   0.03% nfsd{nfsd: service}
 2569 root         20    0    75M    17M select  45   0:11   0.01% mountd
 2543 ntpd         20    0    21M  5520K select  40   0:07   0.00% ntpd{ntpd}
10607 root         20    0    21M    10M select  12   0:03   0.00% sshd
 2583 root         21    0    12M  2828K rpcsvc  47  25:04   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc   1  24:54   0.00% nfsd{nfsd: service}
 2583 root         20    0    12M  2828K rpcsvc  38  23:01   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  17  22:32   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  39  22:27   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc   2  22:26   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc   2  20:50   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  43  20:43   0.00% nfsd{nfsd: service}
 2583 root         21    0    12M  2828K rpcsvc  41  20:35   0.00% nfsd{nfsd: service}
 2583 root         22    0    12M  2828K rpcsvc  17  20:16   0.00% nfsd{nfsd: service}

Code:
dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0    775      2      8    0.1    386   2743    0.0    2.7| ada0
    0    777      4     16    0.1    386   2743    0.0    2.8| ada1
    0    760      3     40    0.2    379   2668    0.0    2.7| ada2
    0    763      6    140    0.3    379   2668    0.0    3.0| ada3
    0    785      3     12    0.1    391   2236    0.0    2.7| ada4
    0    785      3     12    0.1    391   2236    0.0    2.8| ada5
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada0p2
    0    775      2      8    0.1    386   2743    0.0    2.8| ada0p3
    0      0      0      0    0.0      0      0    0.0    0.0| gpt/gptboot0
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada1p2
    0    777      4     16    0.1    386   2743    0.0    2.9| ada1p3
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada2p2
    0    760      3     40    0.2    379   2668    0.0    2.8| ada2p3
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada3p2
    0    763      6    140    0.3    379   2668    0.1    3.0| ada3p3
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada4p2
    0    785      3     12    0.1    391   2236    0.0    2.8| ada4p3
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p1
    0      0      0      0    0.0      0      0    0.0    0.0| ada5p2
    0    785      3     12    0.1    391   2236    0.0    2.8| ada5p3
    0      0      0      0    0.0      0      0    0.0    0.0| gpt/gptboot1
    0      0      0      0    0.0      0      0    0.0    0.0| gpt/gptboot2
    0      0      0      0    0.0      0      0    0.0    0.0| gpt/gptboot3
    0      0      0      0    0.0      0      0    0.0    0.0| gpt/gptboot4
    0      0      0      0    0.0      0      0    0.0    0.0| gpt/gptboot5
 
but how is load average calculated ?
Some dark magic. It's calculated differently compared to Linux, I know that much. In all seriousness, I rarely look at the 'load' numbers because they don't tell you anything. Want to see how busy it is? Look at the CPU usage. Or the I/O usage. Put some monitoring in place that makes nice long term graphs of those figures.
 
Some dark magic. It's calculated differently compared to Linux, I know that much. In all seriousness, I rarely look at the 'load' numbers because they don't tell you anything. Want to see how busy it is? Look at the CPU usage. Or the I/O usage. Put some monitoring in place that makes nice long term graphs of those figures.

So at 95%+ idle I should be fine, the drives are each rated at 400 000 iops and I'm doing 1000 each, it seems I'm massively under what the system can do, but that load number keeps on going up and confusing me :|
 
systat -> :vmstat -> :help

the problem is interpreting that output :)

Code:
vmstat -h -n 6 -c 20
 procs    memory    page                      disks                     faults       cpu
 r  b  w  avm  fre  flt  re  pi  po   fr   sr ad0 ad1 ad2 ad3 ad4 ad5   in   sy   cs us sy id
 0  0  0 650M  42G   61   0   0   0  232    7   0   0   0   0   0   0 9756 9.1K  49K  0  1 99
 1  0  0 650M  42G    1   0   0   0    0    2 949 965 922 896 796 787 32745  295 168K  0  5 95
 1  0  0 650M  42G    0   0   0   0    0    2 674 674 640 638 579 584 33213  536 131K  0  2 98
 1  0  0 650M  42G    0   0   0   0    0    1 440 441 504 503 544 542 27845  186 107K  0  2 98
 1  0  0 650M  42G    0   0   0   0    0    2 541 543 675 674 688 689 32303  249 124K  0  2 98
 3  0  0 650M  42G    0   0   0   0    0    1 836 827 933 958 901 922 29361  344 150K  0  4 96
 0  0  0 650M  42G    0   0   0   0    0    1 791 790 702 705 759 762 32350  348 127K  0  2 98
 1  0  0 650M  42G    0   0   0   0    0    2 534 533 605 606 646 647 27619  257 106K  0  2 98
 1  0  0 650M  42G    0   0   0   0    0    1 633 638 641 644 659 657 27317  400 105K  0  2 98
 1  0  0 650M  42G    0   0   0   0    0    2 513 513 753 752 822 819 28720  291 112K  0  2 98
 0  0  0 650M  42G    0   0   0   0    0    1 1004 1007 1366 1362 1120 1134 34599  238 184K  0  4 96
 1  0  0 650M  42G    0   0   0   0    0    2 615 614 379 380 683 684 31370  557 120K  0  2 98
 1  0  0 650M  42G    0   0   0   0    0    2 657 657 391 390 602 604 31206  257 120K  0  2 98
 0  0  0 650M  42G    0   0   0   0    0    1 649 650 506 508 520 518 29224  334 113K  0  2 98
 0  0  0 650M  42G    0   0   0   0    0    2 712 710 817 817 831 845 27772  254 144K  0  4 96
 1  0  0 650M  42G    0   0   0   0    0    1 710 709 507 507 405 408 28876  336 110K  0  2 98
 0  0  0 650M  42G    0   0   0   0    0    1 724 724 950 948  87  89 27849  311 106K  0  1 99
 0  0  0 650M  42G    0   0   0   0    0    2 919 918 523 522  70  71 26741  252 101K  0  2 98
 2  0  0 650M  42G    0   0   0   0    2    1 811 811 934 936  36  39 26737  209 103K  0  1 99
 3  0  0 650M  42G    0   0   0   0    0    2 1350 1332 1177 1180 964 938 33608  313 172K  0  4 96
 
So at 95%+ idle I should be fine
You're not CPU bound that's for sure. Looking at the %busy of each drive shows they're mostly idling too.

it seems I'm massively under what the system can do
Yes, I would expect this system to be easily able to handle a much higher load than it's doing now.
but that load number keeps on going up and confusing me :|
I think it also has to do with the number of processes you have. The number of nfsd processes (threads) can fluctuate, so after a particularly high peak usage you may have a lot of those still around, but they'll be killed off after a while.

Code:
     --maxthreads threads
             Specifies the maximum servers that will be kept around to service
             requests.

     --minthreads threads
             Specifies the minimum servers that will be kept around to service
             requests.
 
Back
Top