muzinim said:
We are running a firewall on version 8.2 that will page fault on occasion and not reboot. I have searched the forums for a solution but am unable to find one. Attached is a screen shot of the page fault. I have the following items set in
sysctl.conf.
Code:
debug.debugger_on_panic=0
debug.trace_on_panic=0
It has been exceedingly rare for any of several dozen of my FreeBSD systems to successfully perform a crash dump, since way back in the FreeBSD 5.x days. Depending on the cause of the panic, the system may or may not successfully reboot even when crash dumps are disabled. IMHO, the panic handling and crash dump code doesn't get enough attention, perhaps because it isn't a "sexy" part of the kernel to work on. Of course, it is also hard to work on that code because the only way to get there is to have a preceding software or hardware failure, which may or may not be reproducible.
ddb(4) lists a bunch of things you might want to try. If available, you should probably use a serial console instead of / in addition to the standard VGA console, because the useful part of the traceback will usually scroll off the screen. This rules out much "remote console" hardware included on servers, because all you'll see is a copy of the not-too-useful console. With a serial console, you can log everything that comes out and save it for later analysis.
Note that you may run into issues which complicate the process of obtaining useful info. I've had many cases where there was undesired behavior in panic mode - mostly double panics like "sleeping thread owns a non-sleepable lock", but also more esoteric things. In all of those cases, the actual bug that initiated the first panic was fixed, so the problems inside
panic() never arose and were thus not addressed. I've collected a bunch of kernel patches, both locally-developed and from FreeBSD developers, which attempt to work around specific panic-in-panic problems. They're for 8.1 and older and may not apply cleanly to 8.2, either due to the problem they're addressing already being fixed, or simply due to unrelated code changes. They're also for the specific problem areas I encountered, which are probably different from yours.
If your hardware supports it, you could also use the
watchdog(4) facility to force a hardware reset of the server if the kernel stops responding.