small c code for programs who crash on purpose.

Alain De Vos · Jan 14, 2022

I want to write some small C-programs who crash on purpose in order to learn to work with crash-dumps and debugger and call-tracer.
Feel free to post if you have ideas of small C-programs who will crash for different-common-reasons.

jbo@ · Jan 14, 2022

Well, there is PLENTY of stuff that can be shown/listed here. You might look for corresponding literature.

I was tempted dumping a few but it's late so you'll just get the one instead:

Code:

int main(int argc, char* argv[])
{
    int* p = 0;
    *p = 0;
}

mark_j · Jan 14, 2022

An over-zealous use of assert(3) will do the trick:

C:

#include <assert.h>
#include <stdlib.h>

int main(int argc, char** argv)
{
        int a = 1, b = 2;

        assert(a==b);
        exit (1);

}

Other ways, of course, are use-after-free or use without allocation (as jbodenmann points out).

Assert can be useful because you can make a loop, perhaps do something inside it depending on a random number (passed via the command line, for example), and let it crash, then you can analyse the core to see what/how it happened.

covacat · Jan 14, 2022

or call abort()

ralphbsz · Jan 14, 2022

Let's me try to enumerate all the ways a program can crash (which I can think of). Many of those may be identical.

Return with a non-zero return code; easy using "return 1" in main(). Call exit(1). Call abort(), which I think is the same as sending SIGABRT (or SIGABORT?) to the program. Asserts (which are probably internally handled as exits). Integer divide by zero. Floating point divide by zero. I don't remember what happens when you try to take the square root of a negative number ... if that gets turned into a floating point instruction, it might give a different error from it being caught in a run-time library.

The interface for system calls actually doesn't use system call names, it typically uses a "syscall" instruction. That instruction has an integer argument which is the system call number; those are defined in some header file. Find an illegal system call number and execute syscall (probably requires assembly programming).

Speaking of signals: You can send about two dozen different signals to yourself using kill(). The list of signals is in "man signal". There are a few interesting ones in there: The technique jbodenmann suggested above (dereferencing the null pointer) is probably identical to SIGSEGV, but there is also another memory access signal.

You can try various variations of that technique: Write to address zero (which should not be mapped), read or write to an address that is mapped but not writeable or not readable (like try to write to the instruction stream). A particularly fun one would be to try to execute an illegal instruction; doing this would be a little bit of work: you'd have to look at the architecture manual of the CPU to find a binary instruction code that is invalid, then convince the compiler to actually emit that instruction in the stream (probably using assembly), and then jump to it. Many of these perverse things are probably identical to various signals.

zirias@ · Jan 14, 2022

I was thinking about the same thing quite a while ago. The key issue with "crashing intentionally" is: Almost any code that would crash in practice is just "undefined behavior" according to the C standard, and a compiler could always detect it and do something (anything!) else. Also, optimizers might remove the "crash reason" accidentally

Many modern compilers support __builtin_trap() for that purpose, but it's not standard C. So this is your best bet. While crashing accidentally is quite easy, crashing intentionally is, strictly speaking, impossible to do reliably in standard C.

Back then, I came up with the following code:

Code:

#undef HAVE_BUILTIN_TRAP
#ifdef __GNUC__
#  define GCC_VERSION (__GNUC__ * 10000 \
    + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
#  if GCC_VERSION > 40203
#    define HAVE_BUILTIN_TRAP
#  endif
#else
#  ifdef __has_builtin
#    if __has_builtin(__builtin_trap)
#      define HAVE_BUILTIN_TRAP
#    endif
#  endif
#endif

#ifdef HAVE_BUILTIN_TRAP
#  define crashMe() __builtin_trap()
#else
#  include <stdio.h>
#  define crashMe() do { \
    int *volatile iptr = 0; \
    int i = *iptr; \
    printf("%d", i); \
    abort(); } while (0)
#endif

This defines a macro crashMe() to force a crash. It checks for __builtin_trap() support using __has_builtin() and also checks the GCC version (for some old GCC versions that support __builtin_trap(), but not __has_builtin() to detect it).

When compiled with a compiler that doesn't have the required builtin, it uses some "best effort" code instead:

Use volatile, so the read access for dereferencing the null pointer can't be optimized away.
Use printf() so there is a side effect, forcing the compiler to keep the i variable and actually execute the code.
Still add abort() to at least have an unclean exit in cases the compiled code somehow makes it there without crashing

edit: A little additional remark: This "best effort" code of course has undefined behavior as well, so, although it tries to ensure a crash, this is not guaranteed. It might just do an unclean exit (abort()) or might even do something completely different. Sure, the famous nasal demons are somewhat unlikely

– but it might just, for example, never terminate.

Nothing to worry about with a recent gcc or clang though, both support __builtin_trap().

Alain De Vos · Jan 14, 2022

I tried to write to something "const" but the compiler detected it and told me you cannot do this.

Out of bounds array works but does not let the program crash.

jbo@ · Jan 14, 2022

Alain De Vos said:
I tried to write to something "const" but the compiler detected it and told me you cannot do this.

The notion of const is a compile time construct. It's a concept of the language. It has nothing to do with the underlying OS or hardware. const is purely a tool to help out the user of the programming language. It's basically a rudimentary access policy check but the code ultimately being generated by the compiler does not reflect this because it's only a language construct. It only exists for the sole purpose of the compiler being able that you shouldn't do that. If you were to inspect a binary you'd have no way to tell if a variable/parameter/value/... was marked const or not in the original source code.

Alain De Vos said:
Out of bounds array works but does not let the program crash.

I don't know enough about the FreeBSD kernel but while an OS can be able to detect buffer overflows within a process's memory there is no need from the OS to do anything about it as long as the process doesn't try to overflow to memory that doesn't belong to the it's memory.

zirias@ · Jan 14, 2022

jbodenmann said:
The notion of const is a compile time construct. It's a concept of the language. It has nothing to do with the underlying OS or hardware. const is purely a tool to help out the user of the programming language. It's basically a rudimentary access policy check but the code ultimately being generated by the compiler does not reflect this because it's only a language construct. It only exists for the sole purpose of the compiler being able that you shouldn't do that. If you were to inspect a binary you'd have no way to tell if a variable/parameter/value/... was marked const or not in the original source code.

Adding to that: Writing to a string literal is undefined behavior. That's because the compiler is free to place it in some read-only "data" segment. So, something like "hello"[0]='x'; might trigger a crash.

Still, the type is char [] (and not const char []), probably for historic reasons. So yes, const is the purely logical level.

Alain De Vos said:
Out of bounds array works but does not let the program crash.

Try to go further out of bounds. In order to see a crash, you must hit a page that isn't mapped to your virtual address space.

Again, all of this is undefined behavior, all of this can crash any time, but there's never a guarantee, as I tried to explain above!

Geezer · Jan 14, 2022

jbodenmann said:
... because it's only a language construct.

Well isn't most everything in a programming language a language construct ?

George · Jan 14, 2022

panic(9) ?

jbo@ · Jan 14, 2022

George said:
panic(9) ?

Not sure whether it qualifies as a crash if the application itself requests termination.
Based on OP's inquiry I'd have said that we're looking for situations where the OS terminates the application without the application's consent.

Jose · Jan 14, 2022

Inspired by ralphbsz 's answer:

C:

int main(int argc, char** argv)
{
  int i;
  for(i = 1; i >= 0; i--)
  {
    int boom = 1/i;
  }
}

Test run:

Code:

$ clang crash.c
$ ./a.out                                              
Floating point exception (core dumped)

zirias@ · Jan 14, 2022

Compile with -O2. I'd bet no boom

covacat · Jan 14, 2022

C:

#include <stdlib.h>
#include <stdio.h>
/* russian roulette */
int main()
{
void (*p)(void) = (void*)lrand48();
(*p)();
puts("YOU SURVIVED");
return 0;
}

Crivens · Jan 14, 2022

Zirias said:
Compile with -O2. I'd bet no boom

Debatable.
Optimization is not to change the result of a programm. So the division may change the program execution by raising an exception, which means the optimizer has to keep it active. It may, however, see the result is not needed and that the program will zero divide in any case. So to be correct, the compiler would need to simply emit the divide by zero and hopefully issue a helpful insu... ah, warning.

There is of course the case that multiple optimization, each for itself, will not change the semantics but all together it might. There is a nice blog post about this on the llvm forum. If I could only find it...

ralphbsz · Jan 14, 2022

Crivens said:
Optimization is not to change the result of a programm.

For valid programs, that is a sensible goal. Obviously, in practice optimizers don't quite reach the goal; the stories of correct code that gets broken by less-than-optimal (pun!) optimizers are legendary.

For invalid programs, the track record is not so good. Most of the time if the output of your program changes when optimizing, that's an indication that your program is relying on undefined behavior.

Speaking of that: Many decades ago I used to program C++ on a MS-DOS and Windows 3.1 machine, using the Watcom 32-bit extender. Worked perfectly well. One day we discovered that you can dereference memory at address 0 and up just fine, without getting a segmentation fault. That was sort of amazing. Then we discovered that you can actually store a few bytes at address zero and retrieve them, and it works. The null pointer is alive! What really happens is that memory at address zero is mapped, and contains some DOS-internal jump vector tables. Quite a few entries in that table are not actually important, so you can overwrite them and the computer continues to function mostly correctly. And as long as you stay within the Watcom 32-bit environment (continue running the program), the vector tables are completely irrelevant and you can do whatever you want with them. But if you use a lot of memory there (dozens and dozens of bytes), it will not function correctly anymore. So we developed a simple test: After existing our giant C++ program, drop back to the DOS prompt. Then try to print a file (with "print foo.txt"). If your computer reboots, or hangs, or crashes, then you have overwritten something near address zero. For some reason, the print command was particularly sensitive to the vector tables.

zirias@ · Jan 14, 2022

Crivens said:
So the division may change the program execution by raising an exception, which means the optimizer has to keep it active.

Clearly, no. The program has no observable behavior -- as defined by the C standard. A CPU exception does not exist in this model. So, compiling it to an empty program is perfectly fine.

Remember, C is defined in terms of a theoretical virtual (original wording: abstract) machine

edit: and I just repeat myself here, everything that might cause a crash in practice is in the area of undefined behavior when it comes to C. That's why you need an extension like __builtin_trap() to reliably produce a crash, e.g. to test some code designed to handle a crash of a child process.

edit2:

ralphbsz said:
the stories of correct code that gets broken by less-than-optimal (pun!) optimizers are legendary.

That's just a bug in the compiler/optimizer, and unfortunately, they happen, while

ralphbsz said:
For invalid programs, the track record is not so good.

THAT's by design. In contrast to many "modern" languages, C isn't fully defined. There's undefined behavior for almost anything violating language rules.

_martin · Jan 14, 2022

Oh, there are way too many. Not mentioned here: privileged instruction in ring3 (assuming x86/x86_64 arch):

Code:

int main() {
        asm("sti");
        return 42;
}

Also any memory access (both data/text) that is not allowed, unmapped memory access...
Other interesting one is unaligned memory access on certain SSE instructions.

When it comes to getting your bearings around gdb and crash dump jbodenmann's example is almost phrack-like (not gonna say textbook) how you'd do it. I just wish he pointed that p into 0xcafebabe or 0xdeadbeef. It would have been perfect that way.

_martin · Jan 14, 2022

ralphbsz said:
Return with a non-zero return code; easy using "return 1" in main(). Call exit(1).

This is not a crash by any means.

zirias@ · Jan 14, 2022

ralphbsz said:
the stories of correct code that gets broken by less-than-optimal (pun!) optimizers are legendary.

BTW, another remark on that subject: Although such bugs indeed happen, I'm pretty sure at least 99% of the times someone claims an optimizer broke their "correct" code, said code did have undefined behavior.

Just as an example, look at code using the BSD sockets API. The struct sockaddr used there as some abstract base type has a stupid design flaw: It isn't compatible to any of the concrete types like e.g. struct sockaddr_in. (Which could be easily fixed by removing this nonsensical sa_data member btw, but it's probably MUCH too late to touch an historic and widely used API again)

C forbids to have pointers of incompatible types referring to the same object, so you have to be extra careful here (and if you do it wrong, surely an optimizer relying on these aliasing rules will smash your code to bits and pieces). I'm pretty sure a substantial fraction of code using BSD sockets actually contains UB.

small c code for programs who crash on purpose.

Administrator