Why so much swap?

I have a new server that has 4 times the amount of ram I've ever had in a server and there is way too much swap being used. Half the ram was allocated to MariaDB, but this is what I see in a top command output:

Code:
last pid: 53784;  load averages:  1.96,  2.07,  2.12                                                                                                                                                         up 13+23:27:59  20:42:23
93 processes:  2 running, 91 sleeping
CPU:  2.3% user,  0.0% nice,  1.0% system,  0.0% interrupt, 96.7% idle
Mem: 54G Active, 49G Inact, 75G Laundry, 186G Wired, 9431M Free
ARC: 171G Total, 44G MFU, 126G MRU, 42M Anon, 762M Header, 173M Other
     158G Compressed, 204G Uncompressed, 1.29:1 Ratio
Swap: 128G Total, 104G Used, 24G Free, 80% Inuse, 480K In

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
27046 mysql       420  21    0   206G   176G CPU3     3 261.8H 118.96% mysqld
27578 redis         4  29    0  2225M  2172M kqread  32  26.8H  15.48% redis-server
53784 root          1  20    0    13M  2984K CPU44   44   0:00   0.08% top
52987 www           1  20    0    22M    11M select  13   0:11   0.00% lighttpd
52662 root          1  20    0    20M  6428K select  25   0:00   0.00% sshd
3954 root          1  20    0    14M   344K select  45   4:10   0.00% screen
3673 root          1  20    0    11M   840K select  25   0:15   0.00% syslogd
3880 root          1  20    0    17M  1204K select  17   0:12   0.00% sendmail
53033 www           1  24    0   304M    33M accept  47   0:08   0.00% php-cgi
53035 www           1  24    0   277M    32M accept  33   0:07   0.00% php-cgi
53037 www           1  20    0   277M    32M accept  43   0:05   0.00% php-cgi
53039 www           1  20    0   277M    32M accept   5   0:05   0.00% php-cgi
3887 root          1  20    0    11M   476K nanslp  16   0:04   0.00% cron
52992 www           1  21    0   274M    32M accept  43   0:03   0.00% php-cgi
52994 www           1  21    0   277M    32M accept  23   0:03   0.00% php-cgi
53047 www           1  24    0   274M    31M accept  46   0:03   0.00% php-cgi
53043 www           1  23    0   277M    32M accept  44   0:02   0.00% php-cgi
53041 www           1  20    0   277M    32M accept  44   0:02   0.00% php-cgi
53492 www           1  20    0   277M    32M accept  40   0:02   0.00% php-cgi
52993 www           1  20    0   277M    32M accept  47   0:02   0.00% php-cgi
52995 www           1  21    0   274M    32M accept  39   0:02   0.00% php-cgi
53498 www           1  39    0   277M    32M accept  42   0:02   0.00% php-cgi
53045 www           1  20    0   277M    32M accept  38   0:02   0.00% php-cgi
53485 www           1  23    0   274M    32M accept  25   0:02   0.00% php-cgi
3955 root          1  20    0    13M      0 pause   41   0:02   0.00% <csh>
53482 www           1  30    0   277M    33M accept  12   0:01   0.00% php-cgi
52996 www           1  20    0   274M    31M accept  31   0:01   0.00% php-cgi
53496 www           1  39    0   277M    32M accept  27   0:01   0.00% php-cgi
52997 www           1  21    0   274M    31M accept  47   0:01   0.00% php-cgi
53495 www           1  37    0   277M    33M accept  41   0:01   0.00% php-cgi
53497 www           1  47    0   277M    32M accept  15   0:01   0.00% php-cgi
52998 www           1  20    0   277M    32M accept  39   0:00   0.00% php-cgi
3596 root          1  20    0    10M   996K select   9   0:00   0.00% devd
The two main processes on this server are mysql and redis. But that doesn't explain why there is 128G of swap being used?
 
Almost forgot, the reason I ask is I see this in the /var/log/messages:
Code:
# tail /var/log/messages
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(9): failed
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(5): failed
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(13): failed
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(10): failed
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(8): failed
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(5): failed
Aug 22 20:36:33 x syslogd: last message repeated 2 times
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(9): failed
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(5): failed
Aug 22 20:36:33 x kernel: swap_pager_getswapspace(8): failed
 
The system syncs used memory to swap when it has time to do so, and that memory has not changed for some time. That way it can simply steal it when needed. This is faster. So, no worries, all is well.
 
Try create a swap file.
You try one of these if you don't want to manually add one:


I have used swapd in the past in addition to manually creating a swap file. And they both helped.

Now Poudriere kills one of my desktop PCs when building tonnes of packages. It is a BSOD.
I will try a swap file and swapmon now.
 
MySQL looks to be using a lot of your memory. ARC is pretty big too. Limit ARC and tune MySQL.
 
Why? There is no problem. The swap usage simply says that the DB caches a lot of data and delays writes. The OS writes that changed memory to swap so it can steal some GB fast if need be.
 
Swap usage is definitely too high for that kind of workload. Both ARC and Mysql use large amounts of memory; you should probably tune both of them, as SirDice suggested. Also, check your sysctls vm.overcommit and vm.swap_idle_enabled; both should be set to 0.

It's also worth mentioning that your swap space (128 GB) is rather small in relation to the amount of RAM (384 GB, apparently). There's an old rule of thumb that swap should be at least twice the size of RAM. Nowadays that rule isn't as hard as it used to be, but still the VM system can handle that case better than the opposite (swap being only one third of RAM in your case).
 
Why do you all want to waste performance by forcing the write out from when there is time for it to when it is absolutely needed and then stalls the allocation?
 
We don't. But something is trying to use too much, as indicated by the swap_pager_getswapspace messages. Personal experience tells me it's the combination of MySQL/MariaDB and ARC plus everything else that's running on the machine. By tweaking MySQL/MariaDB and ARC there would still be stuff getting swapped out but ideally not more than about 50% of the available swap space.
 
Just to update this thread. I set the vfs.zfs.arc_max to 128GB and this error didn't show anymore. I thought recent versions of ZFS were smarter about handling the ARC and would release it to the system when needed, but I guess not.

For those that say one should have 2x the swap as you do RAM, I think that would be a waste of this nvme storage. In fact, for mysql, you want to prevent swapping from occurring, as mysql/mariadb can handle it's memory better than the OS can for its indexes, and then leave the rest of the ram to the filesystem to cache the frequently used data that is accessed and likely not part of the indexes. While I have many processes running that may take up ram, they are all idle and only on and ready for a failover event where we might need frontend capacity (actually a rare event). The only important things on this server is mariadb and redis, each with a set amount of max ram set aside for each. The rest of the ram (almost half) should be used by the filesystem to cache otherwise frequently accessed data.

In a database server situation, we don't want our database data to go to swap and the swap should only be there in the event we get a unique process running that needs to claim more than the system has. Hopefully in my case, @Crivens is right that the system is only doing this to allow for it to quickly free some ram in the cases it really needs it.
 
My experience has been that a server running ZFS and anything that might quickly allocate a lot of RAM must have the ARC limited. I'm not running db/web applications at the scale you are, but our bhyve system is far from stable if I let ARC run wild. Once limited, it's very stable and a joy to use. It's kind of annoying that one cannot simply specify how much 'free' RAM the system should keep around and let ARC dynamically take up the rest, but instead have to limit ARC to some static value.

Edit: I might have spoken too soon. As I was looking through some of my old sysctl workarounds, I noticed that kern/187594 is finally 'closed', but may require 12-STABLE to take advantage of it. I'll have to experiment with it before I can say for sure it lets you do what I want, but it might be a much better way to handle ARC problems. I'm curious what your uname -a says though.
 
It's a new server and new install...

FreeBSD x 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC amd64

Though I am surprised there isn't a -pX on it. Maybe a default install from network doesn't install the latest patched version? :mad:
 
Last edited:
Though I am surprised there isn't a -pX on it. Maybe a default install from network doesn't install the latest patched version? :mad:
That's by design. It's relatively easy to update via either source or binaries, though.

If you do mess around with the vfs.zfs.arc_free_target tunable, let us know how it goes (and if you needed to go to -STABLE to make it work).
 
Back
Top