Solved load average calculation

CyberCr33p · Aug 27, 2019

I read this:

How the load average is calculated in FreeBSD?

olli@ · Aug 27, 2019

As far as I know, FreeBSD calculates it the same way BSD systems did more than 30 years ago. It is the number of “runnable” jobs (i.e. processes that want to have CPU time) averaged over one minute, five minutes, and fifteen minutes, respectively. The details of measurement may have changed slightly (e.g. the granularity of the time window for statistics gathering), but basically that's what the numbers mean. Currently, the sampling interval in FreeBSD is 5 seconds, with a random “jitter” to avoid synchronisation with processes that run at regular intervals.

You can read all the gory details in the source ode, of course, in particular in /sys/kern/kern_synch.c if you have the sources installed locally, otherwise online here (for HEAD a.k.a. current):
https://svnweb.freebsd.org/base/head/sys/kern/kern_synch.c?view=markup

CyberCr33p · Aug 27, 2019

I start running ~100 small redis instances in a dedicated server, which are all idle at the moment, and the load average from 0.50 goes to 4.00 and then it gets lower and then higher again. These redis instances run inside freebsd jail and auto start when jail starts. As all the instances start at the same time I believe at the same time they do something CPU related which makes the load average go up and down.

olli@ · Aug 27, 2019

That doesn't sound unusual. Rule of thumb: If you don't suffer from any problems, don't look at the numbers. That just makes you feel uneasy for nothing.

The load average numbers are not that important, actually. Even a high number doesn't necessarily mean that the machine becomes overloaded or unresponsive. The CPU states (user, system, idle) are usually more helpful. If you experience real problems, a good start is to run vmstat 5 in a terminal for a minute or two. This gives you a lot of useful numbers. See the vmstat(8) manual page for more options, and also refer to the tuning(7) manual page for a lot of valuable hints.

CyberCr33p · Aug 27, 2019

Yes the system performance was the same before and after the higher load average.

I change the Redis setting lua-time-limit from 5 secs to 0 to disable it (and also remove commands related to lua scripting as I will use redis only for LRU cache and the load average doesn't go up any more.

Looks like lua-time-limit when enabled it does a lot of time system calls that cause the higher load average.

olli@ · Aug 27, 2019

CyberCr33p said:
Looks like lua-time-limit when enabled it does a lot of time system calls that cause the higher load average.

You can verify that by looking at the “sy” column in the output of vmstat 5. That's the average number of system calls per second.

Another way is to find out the PID (process ID) with ps(1), then type truss -cp PID (replace “PID” with the actual ID number), then wait half a minute or so and press Ctrl-C. It will display a nice statistic about which system calls have been called how often, and how much time (cumulative) was spent inside those system calls. Note that you probably need to run the truss(1) command as root.

Here's an example:

Rich (BB code):

# truss -cp 54378
^C
syscall                     seconds   calls  errors
write                   0.000161744       6       0
sendmsg                 0.000277396       6       0
recvmsg                 0.000276278      12       6
read                    0.000179902      12       6
kevent                  8.751450549      23       0
gettimeofday            0.000187723       6       0
_umtx_op                8.747898601      11       0
                      ------------- ------- -------
                       17.500432193      76      12

Typical system calls to retrieve the time are gettimeofday(2) and clock_gettime(2).

PacketMan · Sep 25, 2019

When thinking or worrying about those load average numbers; these specifically: load averages: 0.54, 0.43, 0.37; keep in mind:

The number of cpu cores you have. If you have 1 core then a load of 1.00 means your single core was on average busy 100%. if you have 10 cores then that means your 10 cores were each busy 10% of the time. If you have a load of 10 on a 10 core machine then that means your 10 cores were each busy, on average, 100% of the time. However "busy" does not mean your cpu is getting a consistent steady 'strain' of processing work. And that is why they are called load averages. I think using those load numbers throughout the day, will give you a sense of how 'loaded' your machine is. All that just said keep you eye on the cpu idle field: 97.4% idle I find running top -C -s 5 a great overall command option.

vmstat -w 5 is another great command, and I have read using it with a 5 second internal is the best choice. That command tells you a lot of good stuff. Watch the 'b' column. That is the "blocked" column and it means a process was blocked from running due to lack of resources; if I remember right. For a mission critical machine I would argue that you should aways see a 0 there; meaning if you see a non-zero number occassionally, often, or consistantly you should investigate what is the bottleneck and consider upgrades. You "pi" and "po" columns are your paging stats. And the last column "id" is your cpu idle.

Set up a graphing system, graph these and other key values, and you will have a wealth of technical info. Note I said "technical info". Then there is the 'business info'. Are there any signs or symptoms of overload problems: people with realistic complaints of sluggishness, jobs not running, FreeBSD killing processes due to out of memory/swap situations, etc. If not, then do you really have a problem? Maybe not. So use those numbers as well in a forecasting way. You machine might be 50% busy today, but if it was 30% two years ago, and 40% last year, then you know you will need to soon plan for a hardware upgrade in a couple years time, for example.

olli@ · Oct 8, 2019

PacketMan said:
vmstat -w 5 is another great command, and I have read using it with a 5 second internal is the best choice. That command tells you a lot of good stuff. Watch the 'b' column. That is the "blocked" column and it means a process was blocked from running due to lack of resources; if I remember right. For a mission critical machine I would argue that you should aways see a 0 there; meaning if you see a non-zero number occassionally, often, or consistantly you should investigate what is the bottleneck and consider upgrades. You "pi" and "po" columns are your paging stats. And the last column "id" is your cpu idle.

The po column is the more important one of the two, because it specifies the amount of page-out activity towards the swap partitions. If you get non-zero numbers here over longer periods of time, it might mean that you have a problem with the amount of RAM (or an application with a memory leak, or similar problem). On the other hand, the pi column refers to all page-in activity, which includes loading code pages from executables and libraries. This is perfectly normal and does not indicate a problem.

Another important column for detecting memory-related problems is the sr column (scan rate). This is an indication of the “pressure” on the virtual memory system. The higher the number, the harder the VM system has to work to provide free pages of memory for applications that need them.

SirDice · Oct 8, 2019

olli@ said:
The po column is the more important one of the two, because it specifies the amount of page-out activity towards the swap partitions. If you get non-zero numbers here over longer periods of time, it might mean that you have a problem with the amount of RAM (or an application with a memory leak, or similar problem). On the other hand, the pi column refers to all page-in activity, which includes loading code pages from executables and libraries. This is perfectly normal and does not indicate a problem.

It's the other way around. Page-outs are not a problem, lots of page-ins means you need more memory.

olli@ · Oct 9, 2019

SirDice said:
It's the other way around. Page-outs are not a problem, lots of page-ins means you need more memory.

As I explained, page-ins regularly happen from executables and libraries, when programs are started etc., and this is perfectly normal, no swap involved at all. Opposed to that, page-outs always go to the swap (the only exception is memory-mapped writable files, but not many programs do this). That doesn't have to mean a problem, especially if it happens only occasionally. But if it happens all the time in large quantities, it certainly indicates a memory shortage.

If you still think it's the other way around, would you elaborate on that a little bit, please?

SirDice · Oct 9, 2019

olli@ said:
As I explained, page-ins regularly happen from executables and libraries, when programs are started etc., and this is perfectly normal, no swap involved at all.

As I understood it, page-out or in always involves swap. Page-out is memory to swap, page-in is swap to memory. Given that definition, something can only be paged in if it was previously paged out.

olli@ · Oct 9, 2019

SirDice said:
As I understood it, page-out or in always involves swap. Page-out is memory to swap, page-in is swap to memory. Given that definition, something can only be paged in if it was previously paged out.

That is not correct, I'm afraid. When a program is exec'ed, the executable and the libraries are mapped into the process image. As soon as pages of that image are used, they're paged in from their respective files (provided that they're not already in RAM, of course). This counts as page-ins for the pi column of the vmstat command.

Also, some programs map files into memory. A good example is the cp(1) command that uses mmap(2) to read files if they're 8 MB or less (otherwise it uses read(2)). As soon as the program accesses the mapped data, it is paged in from the corresponding file. You can easily test that by running vmstat 5 in one window, and copy a bunch of files <= 8 MB (for example photos) with cp in another window. Make sure they haven't been accessed recently, otherwise they might already be cached in RAM.

I just did that on an otherwise idle machine, and this is the result:

Code:

procs  memory       page                    disks     faults         cpu
r b w  avm   fre   flt  re  pi  po    fr   sr ad0 ad1   in    sy    cs us sy id
[...]
0 0 4 3.1G  550M     6   0   0   0     0   40   0   0   16   147  2161  0  1 99
0 0 4 3.1G  550M    29   0   0   0    17   41   1   1   16   180  2176  0  1 99
0 0 4 3.1G  303M  6371   0 784   0   227   40 802 763 1567   332 17468  0 10 90
0 0 4 3.1G   90M  7298 239 915   0  4102 4266 866 948 1817   188 19853  0  9 91
0 0 4 3.1G   99M  1103   0  87   0  2224 1830 110  62  219  1153  3841  1  2 97
0 0 4 3.1G   99M    38   0   0   0     1   50   0   0   37   637  2302  0  1 99
0 0 4 3.1G   99M    46   0   0   0    31   50   0   0   19   230  2197  0  1 99

As you can see, the copying of the JPEG files was counted as page-ins.
Of course, no swap was involved whatsoever. Swap usage before and after was exactly the same.

ralphbsz · Oct 9, 2019

olli@ said:
... (the only exception is memory-mapped writable files, but not many programs do this).

You have to be careful there. In some applications, memory-mapping files for IO can be significantly more efficient than read() and write() calls. That's why some IO libraries use mmap() extensively. So even if a program doesn't look like it is doing it, it might do it indirectly through libraries. The examples I know of are some HPC (supercomputing) IO libraries, and some databases. Furthermore, mmap() can be used as a mechanism for allocating memory, and I don't know whether those page faults will be counted towards the pi/po totals. Now, I don't know how common those libraries are used in FreeBSD, in particular not in a desktop user scenario.

SirDice · Oct 9, 2019

olli@ said:
That is not correct, I'm afraid.

Thanks for shattering my worldview. But I'm happy you did. It made me realize I need to schedule my reading material. I have the "Design and implementation ..." collecting dust on my bookshelf and haven't taken the time to actually read it.

Solved load average calculation

CyberCr33p

olli@

CyberCr33p

olli@

CyberCr33p

olli@

PacketMan

olli@

SirDice

Administrator

olli@

SirDice

Administrator

olli@

ralphbsz

SirDice

Administrator