FreeBSD 10 crash on shutdown

Hi all,

I would need some ideas on how to troubleshoot the following issue I have.

Running FreeBSD 10-STABLE or 11-CURRENT, I would encounter a crash of a DL380 G5 with 20GB of RAM and 2 x Xeon5440 during shutdown, preventing the shutdown of the server and always going in a reboot mode due to the crash.

I captured the output through the iLO2 virtual console so apologies for the formatting which may be a bit wrong.
http://pastebin.com/8XvkhHK4

I am going to run a memtest as I read it could be related to memory but would be curious to get people's view on this issue.

Thanks in advance.
 
I have been running memtest for 20 hours now and no error yet.

If anybody has an idea of how to troubleshoot the crash on shutdown that I experience that would be great.

Thanks
 
Well, the bce driver is mentioned right before the panic. I recommend searching PRs or the mailing lists for any known issues with that driver. Perhaps you can replicate the panic by attempting to up/down the interface. The other messages regarding the failing dump is because you don't have enough swap space to dump the contents of RAM for a crash dump. I would recommend setting
Code:
dumpdev="NO"
in your /etc/rc.conf to quiet it down a bit.
 
Thanks Junovitch.

The memtest is clear of errors after 3 passes so I will consider the RAM installed is OK.
I looked for something related to the bce driver without much luck.

I have noticed the following related to the bce adapters but does not look like it's affecting networking as such:

Code:
bce0: <HP NC373i Multifunction Gigabit Server Adapter (B2)> mem 0xf8000000-0xf9ffffff irq 16 at device 0.0 on pci3
bce0: /usr/src/sys/dev/bce/if_bce.c(1299): Management firmware enabled but not running!
miibus0: <MII bus> on bce0
bce0: Ethernet address: 00:1f:29:e1:e3:76
bce0: ASIC (0x57081020); Rev (B2); Bus (PCI-X, 64-bit, 133MHz); B/C (1.9.6); Bufs (RX:2;TX:2;PG:8); Flags (SPLT|MSI|MFW); MFW (NOT RUNNING!)
bce1: <HP NC373i Multifunction Gigabit Server Adapter (B2)> mem 0xfa000000-0xfbffffff irq 17 at device 0.0 on pci5
bce1: /usr/src/sys/dev/bce/if_bce.c(1299): Management firmware enabled but not running!

I have different chipset on another system which looks more recent and it does not complain about the management FW not running. Not entirely sure what that means...

Any idea?
 
Well, that NIC is listed in the bce() man page as supported. I'm not sure about the bits abount management firmware, does the ILO2 share the link with the host? It could be just noise and I'm only speculating that the NIC is involved because of the message just prior to the panic. I would consider jumping on the mailing list with what you are seeing if you don't here anything back here. You may be able to get a developer's eye on it to tell you what that means.
 
Back
Top