1941MB (94%) swap in use, but there's 5861MB free

Hello!

Code:
CPU: 14.1% user,  0.0% nice,  0.9% system,  0.1% interrupt, 84.8% idle
Mem: 2611M Active, 978M Inact, 1096M Laundry, 52G Wired, 109M Buf, 5861M Free
ARC: 42G Total, 11G MFU, 27G MRU, 1808M Anon, 322M Header, 1137M Other
     36G Compressed, 78G Uncompressed, 2.16:1 Ratio
Swap: 2048M Total, 1941M Used, 107M Free, 94% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 1736 mysql        77  20    0  5177M  4683M select   3 2462.4 177.25% mysqld
....

Machine started logging kernel: swap_pager_getswapspace(12): failed around 24 hours ago, which is a surprise, considering it has 64GB physical RAM, and it's used for one main thing (MySQL) which only consumes about 10% of that RAM. A small amount of the remaining RAM is used for transient processes, but it's mostly consumed by ZFS ARC.

I'm wondering why, with 5861MB free, and 42GB of ARC, swap is at 94% (now) and has at several times apparently been at 100%? Shouldn't ZFS be releasing some of that ARC when free RAM gets low?

I tried swapoff -a && swapon -a to at least (temporarily) get past the high swap, but it complains it cannot allocate memory. Now THAT is really confusing: 5.8GB free RAM, plus many GB of expendable ARC... why is a release of 2GB of swap failing?

The only thing I can think of is that some process is suddenly requesting a large allocation, not giving ZFS enough time to shed some ARC and increase free memory... but I cannot find anything which would do this.

The server has been running in this configuration for several months, and top typically displays swap at around 15% or less. This is the first time the logs show the swap error.

Any ideas?

FreeBSD 12.1-RELEASE
Xeon E5-2630L
64GB ECC RAM
Uptime: 64 days

Thanks.
 
Any idea what MySQL is doing?

? What are you asking exactly? If MySQL could be the process that is nabbing memory? It is possible.

What I don't understand is why swap stays at such a precarious level; sitting at 94% for a long period suggests to me that free RAM and/or expendable ARC should be released to bring it down. I seem to remember that excess ARC won't be released quickly enough to satisfy a large allocation request, so with swap sitting at 94%, anything which requests a little more than available free RAM will fail.

I've looked up some of the swap related sysctls, but they seem to be related more controlling how aggressively processes are swapped out in low memory situations. (I guess, technically, this could be considered such, but in this case the machine is not really exhausting all memory.)

Previous thread on MySQL, ZFS, and memory (not quite the same, but close): https://forums.freebsd.org/threads/my-servers-are-using-way-more-memory-then-it-should.73806

Yes, I had the issue with massive memory leaks on MySQL 5.7 (1GB pool -> 24GB+ RAM, OOM killed), solved by installing tcmalloc and using the malloc-lib= directive in my.cnf. The current config has been stable for at least 5 months, until now.
 
Yes, I had the issue with massive memory leaks on MySQL 5.7 (1GB pool -> 24GB+ RAM, OOM killed), solved by installing tcmalloc and using the malloc-lib= directive in my.cnf. The current config has been stable for at least 5 months, until now.
Looks like you've helped me more than I've helped you! That's why I was asking if it was something MySQL was doing, but sounds like you've got mitigations in place for that already.
 
Machine started logging kernel: swap_pager_getswapspace(12): failed around 24 hours ago, which is a surprise, considering it has 64GB physical RAM, and it's used for one main thing (MySQL) which only consumes about 10% of that RAM. A small amount of the remaining RAM is used for transient processes, but it's mostly consumed by ZFS ARC.

I'm wondering why, with 5861MB free, and 42GB of ARC, swap is at 94% (now) and has at several times apparently been at 100%? Shouldn't ZFS be releasing some of that ARC when free RAM gets low?

Yes, it should. But then, your swap is not at 94%, it is at 3% (of installed memory, and that is what counts). ZFS has a notion of free memory, but it does not care about free swap.
There is no check whatsoever, nowhere on the system, for the swapspace running full; the swapspace is considered a last resort of memory, capable to accomodate the demand, and only if it fails to do so, you get the failure message as shown above.

Therefore, the task is to configure the system in a way that either things do (almost) fit into memory and swap is only marginally used (if you want a performant system), or that memory plus swap can accomodate the demand (if you want a not so performant system).

In your case, given the memory size, I think swap is only marginally used (but it would need some performance observation to figure that out in detail).

Then, you have 52G of wired memory. These are locked away and cannot be used by other programs (neither could they be moved to swap if there were more of it). So, if programs need bigger amounts of ram, they will start to push things to swap (but there is none free anymore -> error).
With 42G ARC this may or may not be a normal consumption of wired memory, and it will start to shrink only when your free memory goes below a certain threshold.

So the remaining question is, what your applications do with memory. I do not use MySQL, and I don't know how that behaves. But what can be said, is: if you have applications that occasionally may need chunks of memory in an amount of some 3 GB or more, then you may need either a bigger swap, or limit the size of the ARC (or go into fine-tuning the sysctls for memory management).

Some things to consider:
  • All memory pages in a unix are naturally mapped to some place on the filesystem. For application data memory pages this means they are mapped to the swap: there is a promise made that the swap can accomodate them if memory gets low. Under normal conditions (enough memory available) that promise is rarely executed, therefore the swap can be smaller than memory. But it should always be seen in a relation to real memory.
  • ARC memory is controlled by messages: on low memory, ZFS gets a message that it should consider shrinking, and then it may do so on it's own behalf. There is no way that an application might wait until ZFS does free the memory.
  • Generally you cannot fill memory to the brim. There are things to consider, like reaction times, fragmentation issues, etc., which make it necessary to have a more or less big safety margin.
So much for the general things, now for the interesting stuff:

I tried swapoff -a && swapon -a to at least (temporarily) get past the high swap, but it complains it cannot allocate memory. Now THAT is really confusing: 5.8GB free RAM, plus many GB of expendable ARC... why is a release of 2GB of swap failing?

I think this should not happen, Something appears to be wrong with the free memory count. But it's hard to say from here what it is - it might fluctuate temporarily, it might be some reservation of which I do not know, it might be some kind of fragmentation issue, it might be whatever.
What you can do on an experimenting base, is either squeeze down the ARC manually by some amounts ( vfs.zfs.arc_max should be configurable at runtime) and see at which position it will allow to swapoff, or give it an ample amount of swap and see what it wants to do with that - this will be slow, and then it may become visible which processes do use it.
 
Thanks for the replies so far. I'm still here. Thought it would be better to reboot and gather some data to better understand the problem. Unfortunately, there is still no clear reason why swap used gradually climbs to near 100%, and more confusingly, what has changed to make it suddenly start happening now. Although the logs are filled with swap_pager_getswapspace(22): failed errors, no applications (that I can see) appear to have been affected.

Here's the graph showing free memory, ARC size, and swap used.


I can think of a couple of ways of "avoiding" this swap exhaustion, but they're pretty hacky...

- A cron script which lowers maximum ARC size if swap used is >x%
- A cron script which "resets" swap (swapoff -a && swapon -a) if swap used is >x% (could cause problems as swapoff will momentarily consume free memory)

BTW pyret that sysctl appears to have been removed as of FreeBSD 11.

Thanks for any further suggestions.
 
Further paging data collected from sysctl


The purple line (vm.stats.vm.v_swappgsout) roughly follows swap used in the first graph, although in this one it's a cumulative count (that they share a similar curve despite one being current and the other cumulative, suggests that swap used is mostly just increasing?)

The sudden spike at the end of the green line (vm.stats.vm.v_swappgsin) would be when I did swapoff -a && swapon -a.

When observing the server with 'top', I did notice a lot of small amounts being swapped (in some cases just 4096 bytes). That's the other unusual thing: I'm used to seeing some swap used on my servers, but very rarely do I see when it's actively being paged in or out. With this server, it was obvious there was swap I/O.
 
Back
Top