No reboot after page fault

muzinim · Apr 25, 2012

We are running a firewall on version 8.2 that will page fault on occasion and not reboot. I have searched the forums for a solution but am unable to find one. Attached is a screen shot of the page fault. I have the following items set in sysctl.conf.

Code:

debug.debugger_on_panic=0
debug.trace_on_panic=0

Additionally, the following is set in rc.conf.

Code:

dumpdev=NO
dummynet_enable=YES

Any ideas? Obviously, I would like to stop the system from crashing but I would at least like it to reboot until the problem can be resolved.

SirDice · Apr 25, 2012

It should reboot automatically after 15 seconds.

Most of the time this panic is caused by either bad memory or a bad harddisk (bad sectors in the swap partition).

muzinim · Apr 25, 2012

I am aware the system should reboot but it stayed at that prompt for over 30 minutes. The system is not swapping. It is a HP DL380/G7 with 32GB of RAM running the 64 bit OS. We have 50+ DLXXX systems in production and have yet to see any memory problems. We can pull the system out of production and run diagnostics but I am skeptical this is the problem.

Terry_Kennedy · Apr 26, 2012

muzinim said:
We are running a firewall on version 8.2 that will page fault on occasion and not reboot. I have searched the forums for a solution but am unable to find one. Attached is a screen shot of the page fault. I have the following items set in sysctl.conf.

Code:

debug.debugger_on_panic=0 debug.trace_on_panic=0

It has been exceedingly rare for any of several dozen of my FreeBSD systems to successfully perform a crash dump, since way back in the FreeBSD 5.x days. Depending on the cause of the panic, the system may or may not successfully reboot even when crash dumps are disabled. IMHO, the panic handling and crash dump code doesn't get enough attention, perhaps because it isn't a "sexy" part of the kernel to work on. Of course, it is also hard to work on that code because the only way to get there is to have a preceding software or hardware failure, which may or may not be reproducible.

ddb(4) lists a bunch of things you might want to try. If available, you should probably use a serial console instead of / in addition to the standard VGA console, because the useful part of the traceback will usually scroll off the screen. This rules out much "remote console" hardware included on servers, because all you'll see is a copy of the not-too-useful console. With a serial console, you can log everything that comes out and save it for later analysis.

Note that you may run into issues which complicate the process of obtaining useful info. I've had many cases where there was undesired behavior in panic mode - mostly double panics like "sleeping thread owns a non-sleepable lock", but also more esoteric things. In all of those cases, the actual bug that initiated the first panic was fixed, so the problems inside panic() never arose and were thus not addressed. I've collected a bunch of kernel patches, both locally-developed and from FreeBSD developers, which attempt to work around specific panic-in-panic problems. They're for 8.1 and older and may not apply cleanly to 8.2, either due to the problem they're addressing already being fixed, or simply due to unrelated code changes. They're also for the specific problem areas I encountered, which are probably different from yours.

If your hardware supports it, you could also use the watchdog(4) facility to force a hardware reset of the server if the kernel stops responding.

No reboot after page fault

muzinim

Attachments

SirDice

Administrator

muzinim

Terry_Kennedy