Solved Processes killed with "out of swap space" message

wdick · Jan 29, 2021

Hello,

I updated our servers (20+) to FreeBSD 12.2-RELEASE-p1 GENERIC amd64 and am seeing now that processes get killed with a "out of swap space" message more often.

I found this thread https://forums.freebsd.org/threads/...-out-of-swap-space-message.77975/#post-489668 and will look into my ZFS daily snapshots.

But I have one server, which actually does nothing at all. No services other then sshd, rkhunter and node_exporter. No active users, nothing. I installed it, but it has no purpose right now.

Still I see in the daily security run output mail hundreds of these messages:

Code:

kernel log messages:
+swap_pager_getswapspace(32): failed
+swap_pager_getswapspace(3): failed
+swap_pager_getswapspace(20): failed
+swap_pager_getswapspace(18): failed
+swap_pager_getswapspace(9): failed
+swap_pager_getswapspace(10): failed
+pid 23381 (lsof), jid 0, uid 0, was killed: out of swap space
...

This machine has 8GB of RAM / 8GB of SWAP.

Never has these problems before the upgrade. Before FreeBSD 12.0 war running.

Does anyone has an idea, what could cause that and how to look into it.

Edit 1: From the frequency of the messages, it could point to node_exporter, which is called like every 15 seconds. But can't or shouldn't take more than a couple of MB of RAM.

Edit 2: This is actually a real problem:

Code:

+pid 22275 (postgres), jid 0, uid 770, was killed: out of swap space
+swap_pager_getswapspace(32): failed

PostgreSQL ran fine and stable on it's machine for at least one year. Now after the upgrade (freebsd-update -r 12.2-REALSE upgrade, install and reboot), it already crashed twice in a week.

Regards,
Waldemar

SirDice · Jan 29, 2021

wdick said:
This machine has 8GB of RAM / 8GB of SWAP.

Limit your vfs.zfs.arc_max to about 4GB or less. That should leave ample room for everything else.

PMc · Jan 29, 2021

lsof is a known problem, it will under certain condition grow into multiple GB. Not clear why this happens or when it did appear. Best is to avoid the tool, for now.
kern.maxvnodes is also a known problem: the system may ignore arc_max in order to accommodate for the inode cache. In this case the kernel will not run out of swapspace, it will run out of kernel heapspace - but the error message may not make that clear. In that case, kern.maxvnodes needs to be decreased. Anyway, that was already the same in Rel.11.

So, first task is to monitor your mem+swap usage and find out if your swap does actually run full. If it does, see which process is doing that - and fix it or get rid of it. If it does not, one has to look more closely.

wdick · Feb 1, 2021

Thanks for the tipps!

I assume it is rkhunter, which uses lsof?
At least it is a process starting between 03:00 - 04:00 o'clock on all machines.

The ZFS ARC max is already limited to 1GB. In addition it happens on one UFS machine too.

Will try to monitor it more closely. In Prometheus / node_exporter I can see, when the problem occurs, but not why. Have to look into the cron jobs.

SirDice · Feb 1, 2021

wdick said:
At least it is a process starting between 03:00 - 04:00 o'clock on all machines.

periodic(8) typically runs at that time. Some periodic jobs are quite I/O intensive, they do a lot of find(1) actions. It's possible rkhunter has a periodic job running too.

Day_JJ · Feb 12, 2021

rkhunter causes the problem for me.
It was stable running daily on 12.1p10 but uses all available memory and swap space since upgrading to 12.2.
The problem is listed as Bug 250929 -

wdick · Feb 12, 2021

Thank you Day_JJ for confirming my suspicions!

Day_JJ · Apr 21, 2021

FWIW - updating lsof to 4.94.0 (by pkg upgrade) fixed the problem. rkhunter is running properly again.

Solved Processes killed with "out of swap space" message

wdick

SirDice

Administrator

PMc

wdick

SirDice

Administrator

Day_JJ

wdick

Day_JJ