Got an interesting crash just now (well, as interesting as a crash on a soon-to-be production system can be) :\
This is 8-STABLE/amd64, last cvsup'd early in the morning of May 9th.
The system didn't complete the crash dump, so it needed a manual reset to get it going again.
The crash was a "page fault while in kernel mode" with the current process being the interrupt service routine for the bce0 GigE. Things progressed reasonably until partway through the dump, when the system locked up with a "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the same PID as reported in the main crash.
Screen capture here. Complete dmesg, etc. available on request.
As I mentioned above, the system needed a hard reset to get going again. savecore doesn't think there's a usable dump, so I don't think there's any more info to gather.
I just cvsup'd the box and built a new kernel, in case the previous cvsup was in between related commits, or to see if anything changed since. I still have the old kernel around in case any useful info can be gathered from it.
So, a couple questions:
1) Anything known to be funky w/ bce?
2) Should the part of the system that caused the panic be able to lock up the crash dump process? Obviously, if the disk driver causes a panic, all bets are off when trying to use it to write the dump, but this crash seems to have been from a network driver. Shouldn't a double panic just give up on the dump and try a reboot?
3) Is there any way to rig the system to obtain more info if this happens again? Right now I'm using an embedded remote console server, but I could switch the system to a serial port if enabling the kernel debugger might help. But I think that the sleeping thread bit would happen even at the debugger prompt, wouldn't it?
This is 8-STABLE/amd64, last cvsup'd early in the morning of May 9th.
The system didn't complete the crash dump, so it needed a manual reset to get it going again.
The crash was a "page fault while in kernel mode" with the current process being the interrupt service routine for the bce0 GigE. Things progressed reasonably until partway through the dump, when the system locked up with a "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's the same PID as reported in the main crash.
Screen capture here. Complete dmesg, etc. available on request.
As I mentioned above, the system needed a hard reset to get going again. savecore doesn't think there's a usable dump, so I don't think there's any more info to gather.
I just cvsup'd the box and built a new kernel, in case the previous cvsup was in between related commits, or to see if anything changed since. I still have the old kernel around in case any useful info can be gathered from it.
So, a couple questions:
1) Anything known to be funky w/ bce?
2) Should the part of the system that caused the panic be able to lock up the crash dump process? Obviously, if the disk driver causes a panic, all bets are off when trying to use it to write the dump, but this crash seems to have been from a network driver. Shouldn't a double panic just give up on the dump and try a reboot?
3) Is there any way to rig the system to obtain more info if this happens again? Right now I'm using an embedded remote console server, but I could switch the system to a serial port if enabling the kernel debugger might help. But I think that the sleeping thread bit would happen even at the debugger prompt, wouldn't it?