FreeBSD > 11.3-RELEASE-p11 crashes at boot when HotPlug is initalized

Hello,

I just ran into a weird/annoying problem on an Intel S5000PHB + Xeon L5410 system.

A few weeks ago the system already went "belly up" after freebsd-update. As it was in the middle of the week and the system is a production gateway, I just connected to the console, reverted back to the last BE and went on with other work, planning on investigating this when I have more time available.
Yesterday I wanted to perform the upgrade to 11.0-RELEASE - silently hoping the problem won't show up again - but again the system wouldn't come up but just resets early in the boot process.
Instead of just connecting to the console and reverting back to the last BE, I started looking into the problem a bit.
The system ran/runs 10.3-RELEASE-p11 without any problems. Any later version just crashes early at boot while initialising the PCI devices.

The full dump of the boot (Verbose and with safe-mode enabled) can be found here:
http://pastebin.com/jHR9yMeB

Right after/at setting up pcib3 the system just resets:
Code:
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2
pcib3: attempting to allocate 1 MSI vectors (1 supported)
msi: routing MSI IRQ 256 to local APIC 4 vector 52
pcib3: using IRQ 256 for MSI
pcib3: [GIANT-LOCKED]
pcib3: HotPlug command: 01c0 -> 07ff
p


The only ("non-onboard") pci device is a plain and simple PCIe Intel pro/1000 dual-port NIC. Removing the card doesn't help; the problem still persists.

The BIOS options are quite limited when it comes to deactivating onboard components - I already tried disabling anything possible (all onboard NICs, onboard VGA, IDE and SATA), but to no avail.

Any ideas/hints on how I might work around this from the kernel side or get an even more verbose output to gather more Informations?

Thanks,
Sebastian
 
Last edited:
Small update:

After poking around in the source of FreeBSDs PCIe-HotPlug implementation[1] to somehow figuring out what *should* happen and why it doesn't; yesterday morning it occured to me to just look for system tunables regarding PCIe-Hotplugging and disabling it for a start. (I guess forest, trees etc...)

As it turns out adding hw.pci.enable_pcie_hp=0 to /boot/loader.conf solves the Problem. System is running fine now, survived multiple reboots to be sure and the upgrade to 11.0-RELEASE finished without any problems.
In ~3 Weeks I have another Xeon L5410 system available for testing to see if this issue affects Nahelem in general or just this specific Board/BIOS...
 
Back
Top