"Killed" but no core on very large program

fjl · Sep 6, 2011

I've got a piece of data processing software that's using almost 16Gb of core. It runs for an hour and then stops dead with a simple "Killed" message. I'm stumped. I suspect I'm hitting a kernel limit somewhere but I can't figure out what.

The software is written in C++, compiled with gcc and linked wiht -lcompat -lstdc++. It's nothing (that) fancy - it just reads a 100Gb of XML and builds up a binary file.

This is NOT happening during a memory allocation phase. It was bombing during quicksort, so a sanity check on the array of pointers was performed to ensure there were no nulls. The sanity check then bombed when dereferencing one of them, seeming at random (there are about 400M). It's probably an uninitialized pointer (rather than a null), but this isn't the question.

The question is: Why no core dump?

The login class has the maximum dump size set to infinity. Is there a hard limit hidden somewhere? It's a default 8.2-RELEASE installation on an AMD-64.

It it possible to get a segmentation protection error without getting a core dump? Where is the "Killed" coming from?

Because I'm pushing at the limits (for me) I'm starting to suspect a system bug. If anyone can shed any light on it please help. I really hope this isn't documented somewhere, but I've looked at everything I know about.

Alt · Sep 6, 2011

'Segfault'(SIGSEGV) is not same thing as 'killed'(SIGKILL)
Segfault creates coredump, but killed will not.
When there is no memory or swap available for use - kernel can kill most 'fat' process so server wont hang. Plus, time-to-kill may vary (depends on buffers, processes status etc). I think you got exact this problem. Maybe you can modify program so it will process file by smaller chunks.

expl · Sep 6, 2011

It will not dump core if it was killed and not aborted (like normal segfault). Kernel kills process if it runs out of memory and swap.

fjl · Sep 6, 2011

expl said:
It will not dump core if it was killed and not aborted (like normal segfault). Kernel kills process if it runs out of memory and swap.

Thanks, but I'm familiar with kernel kills. However, there's still 40Gb memory free to allocate. I've also increased the stack size to 1Gb. Note that on a smaller data set this code runs to conclusion.

AIUI if a stack over-run is detected it'll send a SEGV signal, and this should dump the core. It'd be the same for a bus error and anything else where the program has gone off-track. So why am I not getting a core dump? What is sending a Kill to this process, and why?

Can anyone say definitively what it is that's causing the "Killed" message? It's probably a SIGKILL - but who, what and why is it arriving and can it appear as an alternative to a core dump if there's something inhibiting it for processes with large heaps?

fjl · Sep 7, 2011

Solved - turned out to be a FreeBSD "feature" (bug)

Thanks to both who have replied to this and anyone who's read it and thought of replying.

It has turned out to be a "feature" in the swapper - an internal table was running into trouble and sending a SIGKILL to the process. Bug? I'd say so because it didn't produce any diagnostic - as far as I can tell. I'll take a closer look at the swapper code and post an update when I'm sure of what I'm saying, but increasing the table size in the kernel has stopped the problem.

Manually killing the process to see if I could generate a core overloaded the kernel while it tried to create a core file, which is how I found it. The kernel had no choice but to complain as it locked up. In these circumstances, killing the user process without a Seg fault is reasonable.

Anyway, full report when I've figured it all out but in the mean time, no suggestions needed. Incidentally, I posted this in the FreeBSD development as I suspected it related to an internal kernel issue rather than a problem rather than my knowledge of 'C' (which is better than my knowledge of the FreeBSD kernel, but System V is now more)

fjl · Sep 7, 2011

Well I've got to the bottom of it.

Within the kernel there is a structure called swblock in John Dyson/Matthew Dillon VM handler, and a pointer called "swap" points to a chain of these these. Its size is limited by kern.maxswzone, which you can tweak in /boot/loader.conf. The default (AMD64 8.2-Release) allows for about 14Gb of swap space, but because it's radix tree the precise calculation gives me a headache. However, increase the swap space beyond this and it'll report as being there, but when when you try to use it, crunch!

Although this variable is tunable, it's also hard-limited in include/param.h to 32M entries; each entry can manage 16 pages (if I've understood the code correctly). If you want to see exactly what's happening, look at vm/swap_pager.c.

The hard limit to the size number of swblock entries is set as VM_SWZONE_SIZE_MAX in include/param.h.

So, what was happening to my process? It was being sent a SIGKILL by vm_pageout_oom() in vm/vm_pageout.c. This gets called when swap space OR the swblock space is exhausted, either in vm_pageout.c or swap_pager.c. In some circumstances it prints

Code:

swap zone exhausted, increase kern.maxswzone\n

beforehand, but not always. Its effect is to find the largest running non-system process on the system and shoot it using killproc().

If you want the full story see my rather long blog on the subject

Thanks again to everyone looking at this, and I hope the above is useful to anyone having the same issue.

fjl · Sep 7, 2011

Alt said:
'Segfault'(SIGSEGV) is not same thing as 'killed'(SIGKILL)
Segfault creates coredump, but killed will not.
When there is no memory or swap available for use - kernel can kill most 'fat' process so server wont hang. <snip>I think you got exact this problem. Maybe you can modify program so it will process file by smaller chunks.

You were both correct, but I've tracked down the issue in the kernel and fixed that. My software is, of course, perfect and it's up to the kernel to run it correctly

My detailed post is awaiting moderation because it's got a link to a long blog post and I might be a spammer :\

Incidentally, it's not the case that the feature prevents server from hanging - send the process an abort while it's swapper control block tree is full and it hangs in spectacular style!

expl · Sep 7, 2011

Btw you can just run your program in gdb and find out when and where you received SIGKILL, do not need a corefile for that.

vwe@ · Sep 8, 2011

From what you've written so far, I suspect your app has already eaten up too much swap space and tries to allocate more. The swap process tries to solve that situation by aborting the process that has consumed most RAM. It's not being killed through signalling a SEGV as there's no segment violation. It's just that there's no more RAM left to eat and no more swap space. The kernel is really trying hard to solve that and as a last resort, it's going to kill your app.

If the kernel is going to kill a process, it is really full of sorrow but it tells you about that at the console screen. There, You should find a hint why your process has been killed.

fjl said:
My software is, of course, perfect and it's up to the kernel to run it correctly

Nobody was questioning that. Code that consumes 14GB of RAM must be some clever and highly optimized piece of code.

fjl · Sep 9, 2011

Possible need for better diagnostic

vwe@ said:
If the kernel is going to kill a process, it is really full of sorrow but it tells you about that at the console screen. There, You should find a hint why your process has been killed.

Thanks for the follow-up, but see earlier post - I found the place in vm/vm_pageout.c. The point is that it *sometimes* prints a message to the console, but not always. I'd been assuming some diagnostic would appear.

vm_pageout_oom() does a FOD on the largest non-system process it can find, ultimately:

Code:

        if (bigproc != NULL) {
                killproc(bigproc, "out of swap space");
                sched_nice(bigproc, PRIO_MIN);
                PROC_UNLOCK(bigproc);
                wakeup(&cnt.v_free_count);

When it's called in swap_pager.c it does a

Code:

printf("swap zone exhausted, increase kern.maxswzone\n");

first. When it's called in vm_pageout.c vm_pageout_scan() I can't see how it's printing a diagnostic to the console. I could be wrong but I can't see where.

As a refinement, the above message is not as helpful as it could be. IMHO it'd be useful to change it to something like

Code:

"swap zone exhausted. Increase kern.maxswzone. Killing largest process %d to free space.\n"

(Or does this prinf() do a malloc()? IIRC there's a safe one that won't).

Incidentally, the 16Gb of working storage is used for assembling binary cartographical data. The efficient Germans have 50,000,000+ items on the map of their country (OSM+NASA radar survey+...) - it adds up!

expl · Sep 9, 2011

What wve ment was if you are using swap in such large amount you are doing something wrong as this is not very smart or scalable or even that portable. You are bringing all system's performance down, instead of using some short of binary tree or hash table that can be unloaded to a file based cache and using a fixed amount of ram rather than letting memory buffer grow for ever.