Process limits and Out-Of-Memory killer - can it be tuned?

rowan194 · Nov 4, 2019

I'm having a problem with a Python script that chokes on a particular input file, and causes excessive memory usage. Allocated size gets up to around 96GB, and with only 32GB of physical RAM, the system quickly grinds to a halt.

After several hours (literally: the system stalled for around 23 hours the last time this happened), the OOM killer finally kicks in, but instead of just killing off the process hogging the exhausted resource, it kills off other processes too, including unrelated but important ones like ntpd, or sendmail. At this point the server has to be restarted since half the background daemons have been killed off.

Is there a way to tune the OOM killer and/or limits to ensure that:

1. The system cannot get to the point where everything stops because a single process (normal user, not background daemon, not root initiated) has exhausted a resource
and
2. The OOM killer targets only the obvious culprit

I know that limits(1) could possibly achieve this, but that seems to be more about hard per-user or per-process limits.

I'd rather some way to tell FreeBSD that the process I'm about to run has first priority for OOM killing if it exhausts memory, and to leave other processes alone. Or more broadly, kill the current user's processes before randomly killing background daemons. Is it possible?

FreeBSD 12.0

Thanks.

bds · Nov 4, 2019

Resource limits are the best way to achieve this IMO. Denying a request for resources is more sensible than a shoot first, ask questions later policy. Typically by the time an OOM killer kicks in the system has been thrashing for an extended period, unless you disable swap and forego the benefits of paging. And any scheme to determine "the obvious culprit" will likely still pick off an innocent bystander in a signification proportion of cases.

rowan194 · Nov 5, 2019

I agree that denying a resource allocation request would be far preferable to an abrupt kill, but limit works with absolute values, rather than proportions. This means that you need to try to guess in advance exactly how much to allocate to each process or user, taking into account that the resource allocation of some processes may vary drastically over their runtime (including both allocation and deallocation). With values too conservative, a minor spike in (say) memory allocation for a process, even when the system has sufficient memory to satisfy the request could end with a fail; with values too permissive, I could end up in the situation where my server freezes for 23 hours. ZFS ARC and other caching also complicate things, because a system that technically has little memory available, at a given time, could still satisfy a request by ejecting objects from ARC and releasing the memory.

I guess I'm more after something which can prioritize processes when a particular resource becomes limited and needs to be denied, but without needing to specify exact limits.

I can think of one way that limits(1) could help in this specific case: set memoryuse (and/or vmemoryuse) to physical RAM. Unless you have an unusually high amount of swap, it seems unlikely that a single application could possibly allocate more than physical RAM. In this very specific instance, the Python script blatantly attempting to allocate 96GB on a 32GB system should fail gracefully.

Something like:

Code:

physmem=`sysctl -n hw.physmem`
eval `limits -m ${physmem} -e`

ljboiler · Nov 5, 2019

Size suffixes only are used for input values to the limits command. For output, the -e option lets your shell control the implied units, which for sh and csh appear to default to kBytes,

Without the -e option, limits will explicitly show the units.

rowan194 · Nov 5, 2019

ljboiler said:
Size suffixes only are used for input values to the limits command. For output, the -e option lets your shell control the implied units, which for sh and csh appear to default to kBytes,

Yes, my mistake. I did understand that it was probably converting to the shell's expected units, but for some reason I missed the "k" in "kbytes" when inspecting the result of limit. I'll update my post to remove the erroneous information.

I can confirm that the rogue program is kept in its place when setting limit/ulimit to physical RAM; this time it ended cleanly with "MemoryError" (which appears to be Python reporting the failed malloc) rather than freezing up the server.

Process limits and Out-Of-Memory killer - can it be tuned?

rowan194

bds

rowan194

ljboiler

rowan194