I upgraded one of my nodes from 12.3 to stable/13
Then I started to get random program crashes from the backup tool sysutils/bareos-client , always after some 10-20 hours runtime.
I hacked the backup tool to actually produce coredump (they had the great idea to catch SIGSEGV&friends, for whatever crap might work on Linux, but not here), and set kern.sugid_coredump=1
Now the crashes are either SIGSEGV or SIGBUS, and they come from arbitrary places in the code. Oh crap.
I looked into the images, and it appears that places in memory (e.g. memory addresses) are overwritten with the string "oSoC" (it seems to always be that string, but at different locations).
I had another look:
Now, is that as it should be, or something else? (not sure if it's a stable/13 issue or an issue with the program)
I tried to get some sane coredumps to compare, from 12.3 and stable/13, in the usual way (
Doesn't work in either 12.3 or stable/13, doesn't work in single user, doesn't work in GENERIC kernel.
Error 94 says "Not permitted in capability mode", and this leads to capsicum(4), which doesn't give much of a clue.
Now I am wondering: why is the sleep command in singleuser running in capsicum?
I finally found a switch kern.capmode_coredump in
Finally, those oSoC (or CoSo) seem to not belong there, so this might be a rogue pointer spamming the memory space. Love it.
Posting this in case anybody else gets error 94.
Then I started to get random program crashes from the backup tool sysutils/bareos-client , always after some 10-20 hours runtime.
I hacked the backup tool to actually produce coredump (they had the great idea to catch SIGSEGV&friends, for whatever crap might work on Linux, but not here), and set kern.sugid_coredump=1
Now the crashes are either SIGSEGV or SIGBUS, and they come from arbitrary places in the code. Oh crap.
I looked into the images, and it appears that places in memory (e.g. memory addresses) are overwritten with the string "oSoC" (it seems to always be that string, but at different locations).
I had another look:
Code:
# strings bareos-fd.0.0.core
FreeBSD
bareos-fd
/usr/local/sbin/bareos-fd -u root -g wheel -v -c /usr/local/etc/bareos
FreeBSD
CoSo
CoSo
+oSo
FreeBSD
Now, is that as it should be, or something else? (not sure if it's a stable/13 issue or an issue with the program)
I tried to get some sane coredumps to compare, from 12.3 and stable/13, in the usual way (
kill -SEGV
). And I found, I cannot! Instead, I get this:
Code:
kernel: pid 53977 (sleep), uid (0): Path `/var/tmp/sleep.0.0.core' failed on initial open test, error = 94
kernel: pid 53977 (sleep), jid 0, uid 0: exited on signal 4
Doesn't work in either 12.3 or stable/13, doesn't work in single user, doesn't work in GENERIC kernel.
Error 94 says "Not permitted in capability mode", and this leads to capsicum(4), which doesn't give much of a clue.
Now I am wondering: why is the sleep command in singleuser running in capsicum?
I finally found a switch kern.capmode_coredump in
sysctl
that helps the issue, but doesn't seem to be documented anywhere (not even the commitlog gives much insight).Finally, those oSoC (or CoSo) seem to not belong there, so this might be a rogue pointer spamming the memory space. Love it.
Posting this in case anybody else gets error 94.