random program crashes, no coredumps, and error 94

libc does not know about /dev/null it knows about file descriptor 1 which may go anywhere
It goes to /dev/null, see lsof output above.

When running in foreground, it looks like this, and there are no errors:
Code:
 # lsof -p 24185
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
bareos-fd 24185 root    0u  VCHR               0,91 0t922928   91 /dev/pts/1
bareos-fd 24185 root    1u  VCHR               0,91 0t922928   91 /dev/pts/1
bareos-fd 24185 root    2u  VCHR               0,91 0t922928   91 /dev/pts/1
bareos-fd 24185 root    3u  IPv4 0xfffffe008612d950      0t0  TCP moon

When I remove the fputs/fflush from the code, there are no failures.
When I step thru the fputs/fflush, they execute code in libc and libthr.

EBADF says that stdout->_file is an invalid handle

And how might this result in libc taking the first two bytes of the output string and writing it to some bogus memory location?
 
I don't use lldb, I use gdb. I'm assuming they are similar enough. I asked to set breakpoint on the line of code you shared (fflush), not via "/dev/null".
You can do it in two ways. Either run debugger, set the breakpoint and continue or run the program, attach the debugger (gdb -p ) , set the breakpoint and let it continue. I'd prefer the later option.
If you have compiled it via ports you could enable debug mode (CFLAGS+=-g) and set it on line of code, or if you're debugging binary without debug symbols you need to find which instruction is calling that fflush and set breakpoint on that.

FILE* structure has fd assigned to it or -1 if none is used. It would be interesting to see if that FILE* structure has sane values. As covacat mentioned it may be that the FILE* structure is already corrupted, not causing the corruption (or in other words it's a victim of a bug, not a bug).

FILE* (and hence printf,scanf & friends) does use buffers (allocated on heap). It could be that this part of the code rubs the bug the correct way and gets triggered.

That EBADF could be that stdout is set to -1, i.e. not used. This is not a bug necessarily. Consider this example of code:
Code:
close(1);
..
..
fprintf(stdout, "hello world\n");
Technically there's nothing wrong with this code. But printing to stdout in the terminal will most likely (stdout can be redirected elsewhere, that's why I said most likely) end up in error EBADF. There may be logic in the code where 0,1,2 is either redirected to socket or closed completely. Hence the error.

But I keep asking myself -- what has changed in FreeBSD that it's being triggered now.
 
I don't use lldb, I use gdb. I'm assuming they are similar enough. I asked to set breakpoint on the line of code you shared (fflush), not via "/dev/null".
Yes, but somehow you must "ask to set breakpoint" - usually by typing the respective debugger command into a terminal via stdin/out. And you cannot do this through /dev/null, and the error only happens when stdin/out is /dev/null.

But I keep asking myself -- what has changed in FreeBSD that it's being triggered now.

Just figured that one out. Hold on...
 
We don't understand each other. You run the bareos-client or whatever that binary is as usual. Then, from other terminal, you attach to that running command with the gdb -p $pid where $pid is the pid of that bareos client. Then you can set the breakpoint. And you continue to debug it "live".
 
EBADF says that stdout->_file is an invalid handle
Apparently it is an invalid handle only for fflush(), not for fputs(), because fputs() writes the first 4350 bytes successfully, and only fflush() fails from the beginning.

So I think this cannot be. But apparently it has to do something with buffering.
So I try, what happens when I do
fclose(stdout);
and then try to write something out -> that fails rightaway with EBADF.

And what happens when I do
close(1);
and then try to write something to stdout? Now there are differences:

12.3/dev/ttyfputs() fails EBADF, fflush() works
12.3/dev/nullfputs() works, fflush() fails EBADF
13/dev/ttyfputs() fails EBADF, fflush() fails EBADF
13/dev/nullfputs() works for ~4300 bytes, then fails EBADF, fflush always fails EBADF
 
We don't understand each other. You run the bareos-client or whatever that binary is as usual. Then, from other terminal, you attach to that running command with the gdb -p $pid where $pid is the pid of that bareos client. Then you can set the breakpoint. And you continue to debug it "live".
Sorry, didn't know one can do that.
 
It seems this problem is solved. (If you look carefully enough, you could see the nature of the problem already in the quotes above.)


Now ain't this gorgeous????

Here we close all the stdio handles, and then we open them again - and we open all of them as O_RDONLY. (And one can see this from the lsof output above: it shows stdout and stderr handles as 1r and 2r).

This is precisely why I love this code so much. It does lots of superfluous things, and it mostly does them wrong.

But then, there is also a weakness in libc. One probably should not modify the lower level close()/dup() filehandles while also using the upper level stdio functions. But then, if one tries to write onto the upper level while at the same time having the lower level set RDONLY, this probably should not result in gross memory corruption.
 
12.3/dev/ttyfputs() fails EBADF, fflush() works
12.3/dev/nullfputs() works, fflush() fails EBADF
13/dev/ttyfputs() fails EBADF, fflush() fails EBADF
13/dev/nullfputs() works for ~4300 bytes, then fails EBADF, fflush always fails EBADF
this shows a different initial buffering (see setvbuf)
if the FILE is line buffered or unbuffered fflush has nothing to do because the buffer is always empty (after a fputs)
if the FILE is fully buffered fputs may only write to memory and never touch the file descriptor
so the failure always occurs at _swrite
 
this shows a different initial buffering (see setvbuf)
if the FILE is line buffered or unbuffered fflush has nothing to do because the buffer is always empty (after a fputs)
if the FILE is fully buffered fputs may only write to memory and never touch the file descriptor
so the failure always occurs at _swrite
Maybe. I tried to somehow reproduce behaviour with setbuf, but wasn't successful. Maybe I didn't try hard enough; anyway, we can say for certain that there is a difference between 12.3 and stable/13 (as of 2 weeks ago, because I don't update base while hunting a bug - so this may be related to some development work also). But, honestly, this is rather my least concern.

What is of concern to me is on one hand the phantastic coding quality as shown above, where I don't know what else is lingering there and ready to stab me in the back. And on the other hand these memory overwrites, which are probably not well explainable by different buffering behaviour (and probably also not by ongoing development work, but then, one should check the commit logs).
And the third issue is the original question of error 94, i.e. what kind of things happen in relation to capsicum, and where that might be documented.
 
Please try reproducing:

Code:
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>

main() {
    char buf[] = "12345678901234567890123456789012345678901234567890";
    int fd = open("/dev/null", O_RDONLY);
    int i = 0;

    close(1);
    dup2(fd, 1);
    close(fd);

    while(1) {
      fputs(buf, stdout);
      fflush(stdout);
      i++;
      fprintf(stderr, "%d\n", i);
    }
}

Here it crashes after 135962 iterations. (stable/13 @ 22ba2970766 )
 
Damn, I had too many beers to concentrate now. 13.0-RELEASE (releng/13.0-n244733-ea31abc261f) ok, 14-current - crash after 44781.

Quick check in gdb:
Code:
root@fbsdc:(/tmp/forums)$ gdb test test.core
(gdb)
..
#0  0x000000082232d82d in memcpy () from /lib/libc.so.7
=> 0x000000082232d82d <memcpy+173>:    48 89 17    mov    QWORD PTR [rdi],rdx
(gdb)

(gdb) i r $rdi
rdi            0x824f07ffa         34979479546
(gdb)

(gdb) x/3i $pc
=> 0x82232d82d <memcpy+173>:    mov    QWORD PTR [rdi],rdx
   0x82232d830 <memcpy+176>:    mov    QWORD PTR [rdi+rcx*1-0x8],r8
   0x82232d835 <memcpy+181>:    ret
(gdb)

(gdb) i r $rdi
rdi            0x824f07ffa         34979479546
(gdb)

(gdb) ip
         0x824d08000        0x824f08000   0x200000        0x0  rw- ----
(gdb)

(gdb) bt
#0  0x000000082232d82d in memcpy () from /lib/libc.so.7
#1  0x00000008222ed03f in ?? () from /lib/libc.so.7
#2  0x00000008222eb5cd in fputs () from /lib/libc.so.7
#3  0x0000000000201b54 in main () at test.c:15
(gdb)
So SIGSEGV is due memcpy overstepping into unmapped memory: 0x824f07ffa + 8 = 0x824F08002 which is unmapped. We should check the /usr/src/lib/libc and see what changed there within 13 release (my 13.0 is working just fine).

edit: btw fflush() is not needed.
 
Damn, I had too many beers to concentrate now.
Hey, didn't want to disturb Your weekend!

13.0-RELEASE (releng/13.0-n244733-ea31abc261f) ok, 14-current - crash after 44781.
Thanks, that's valuable (so it's not one of my local patches or such).

Your crash location is mostly same as here.

I have a stable/13 at fa3cc60e6dc without crash, but that's a generic build for default CPU,
so I wasn't sure yet. But if this is not the CPUTYPE, and only libc, then there is now not
much delta:

fa3cc60e6dc Jan 8 10:22:08 2022
22ba2970766 Feb 10 16:11:22 2022

What do You think about these:
ec2db06d0db22ae11c1b5414446e3aecd71a93e3
afa9a1f5ec9974793a8744c55036ef5c4d08903d
 
It's ok, I do like these types of problems. I may not be able to help tonight much though.
Both seem to be about fflush; we can reproduce it without it (so I didn't check further).

I did spin up 13.0-RELEASE-p7 VM and I can't reproduce it there either.
 
It's ok, I do like these types of problems. I may not be able to help tonight much though.
Both seem to be about fflush; we can reproduce it without it (so I didn't check further).
I suppose the fputs() will internally call fflush() (or equivalent) when the buffer is full.

I did spin up 13.0-RELEASE-p7 VM and I can't reproduce it there either.
It came with afa9a1f5ec9974793a8744c55036ef5c4d08903d into stable/13.
 
this is causing it
i built a 13.0-R libc with that file replaced and it bombs when internally fflush is called
if you disable buffering with setvbuf it works
it always bombs at size of vbuf / size of string outputed so when the buffer fills it bombs
a vbuf of 16k and the string of 50 causes it to bomb at 328
the explicit call to fflush is not needed (like _martin said)
 
I can't find a PR related to PMc's message #43. Going by the commit to -CURRENT (2022-03-06 15:29:51) in
and comparing with:
stable/13/lib/libc/stdio/fvwrite.c L135-L142
stable/13/lib/libc/stdio/fvwrite.c L176-L182
releng/13.1/lib/libc/stdio/fvwrite.c L135-L142
releng/13.1/lib/libc/stdio/fvwrite.c L176-L182

At the moment it is not in 13-STABLE (as the precursor of 13.1-RELEASE) and not in the just branched of releng/13.1 (per commit 13.1: create releng/13.1 branch as of 2022-03-10 00:10:32).
 
Back
Top