Solved Processes killed with "out of swap space" message

Hello,

I updated our servers (20+) to FreeBSD 12.2-RELEASE-p1 GENERIC amd64 and am seeing now that processes get killed with a "out of swap space" message more often.

I found this thread https://forums.freebsd.org/threads/...-out-of-swap-space-message.77975/#post-489668 and will look into my ZFS daily snapshots.

But I have one server, which actually does nothing at all. No services other then sshd, rkhunter and node_exporter. No active users, nothing. I installed it, but it has no purpose right now.

Still I see in the daily security run output mail hundreds of these messages:
Code:
kernel log messages:
+swap_pager_getswapspace(32): failed
+swap_pager_getswapspace(3): failed
+swap_pager_getswapspace(20): failed
+swap_pager_getswapspace(18): failed
+swap_pager_getswapspace(9): failed
+swap_pager_getswapspace(10): failed
+pid 23381 (lsof), jid 0, uid 0, was killed: out of swap space
...

This machine has 8GB of RAM / 8GB of SWAP.

Never has these problems before the upgrade. Before FreeBSD 12.0 war running.

Does anyone has an idea, what could cause that and how to look into it.

Edit 1: From the frequency of the messages, it could point to node_exporter, which is called like every 15 seconds. But can't or shouldn't take more than a couple of MB of RAM.

Edit 2: This is actually a real problem:
Code:
+pid 22275 (postgres), jid 0, uid 770, was killed: out of swap space
+swap_pager_getswapspace(32): failed

PostgreSQL ran fine and stable on it's machine for at least one year. Now after the upgrade (freebsd-update -r 12.2-REALSE upgrade, install and reboot), it already crashed twice in a week.


Regards,
Waldemar
 
lsof is a known problem, it will under certain condition grow into multiple GB. Not clear why this happens or when it did appear. Best is to avoid the tool, for now.
kern.maxvnodes is also a known problem: the system may ignore arc_max in order to accommodate for the inode cache. In this case the kernel will not run out of swapspace, it will run out of kernel heapspace - but the error message may not make that clear. In that case, kern.maxvnodes needs to be decreased. Anyway, that was already the same in Rel.11.

So, first task is to monitor your mem+swap usage and find out if your swap does actually run full. If it does, see which process is doing that - and fix it or get rid of it. If it does not, one has to look more closely.
 
Thanks for the tipps!

I assume it is rkhunter, which uses lsof?
At least it is a process starting between 03:00 - 04:00 o'clock on all machines.

The ZFS ARC max is already limited to 1GB. In addition it happens on one UFS machine too.

Will try to monitor it more closely. In Prometheus / node_exporter I can see, when the problem occurs, but not why. Have to look into the cron jobs.
 
Back
Top