FreeBSD 10.3 won't boot on RECENT DELL 14 Generation systems

PROBLEM:
We are having a problem booting DELL R740xd and R640 (14 Generation systems) manufactured starting in July 2019 using a previously working standard FreeBSD 10.3 system disk. R740xd systems built BEFORE July 2019 still boot FreeBSD 10.3 properly.
SYMPTOM:
The system goes through the DELL POST screens, then starts reading the kernel into memory. (UEFI and BIOS boot both fail with the same errors.)
The boot error shows as: “module_register: cannot register pci/em from kernel; already loaded from if_em.ko Module pci/em failed to register: 17”
The same message repeats for pci/lem, pci/igb, pci/mrsas.
The system goes through RAID recognition properly, locating the VD0 drive. (only boot drive installed)
  • Loading /boot/defaults/loader.conf
  • System goes to FreeBSD Logo page – completes 9 second timeout
  • Loads /boot/kernel/kernel
  • Loads /boot/kernel/zfs.ko
  • Continues loading several other drivers OK
  • Loads /boot/kernel/smbus.ko
  • Loads /boot/kernel/aesni.ko (advanced Encryption)
  • Stops at BLUE “Booting…”
  • Starts to load FreeBSD 10.3-RELEASE-P29
  • Shows clang version 3.4.1
  • And then starts with the errors above
  • Then stops dead with “kernel trap 12 with interrupts disabled”
CAVEATS
We know that FreeBSD 10.3 will not even boot an installer USB drive. (crashes with the SAME errors) FreeBSD 10.4 also crashes.
The FreeBSD image is completely standard – no modifications to the install image.
And to emphasize, FreeBSD 10.3 was working fine until this last July build of the DELL PE R740 and R640.
FreeBSD 11.0 does NOT boot. FreeBSD 11.1, 11.2, and 11.3 DO boot and install to SSD correctly. But we are three/four months away from running our system with a newer version FreeBSD. We are evaluating FreeBSD 12.1 as our next version of FBSD, however, until that work is done (FreeBSD 12.1 is officially released) we need FBSD 10.3 to continue to boot on these servers.
We ‘know’ that DELL does NOT support FreeBSD. That is why we are asking for help from the Community.
FACTS
DELL documentation confirms a change on the motherboard in the DELL R740xd and DELL R640 in July of 2019, but they will not confirm what the change was.
We have tested every removable component, down to the CPUs. ALL components from the newly manufactured servers DO work in the older motherboard manufactured before June 2019. Components from the working old system will NOT boot FreeBSD 10.3 in the NEW system.
We are hoping the community can help us identify the issue and direct us to a possible solution for FBSD 10.3 until FBSD 12.1 is released.
Any help or advice would be welcome.
 
The FreeBSD image is completely standard – no modifications to the install image.
It's not standard. For starters it's 10.3-RELEASE-p29, the standard install disk would have 10.3-RELEASE (without the patches). The standard install disk also doesn't load aesni(4) or smbus(4).
 
Thanks for prompt reply, and correcting me on two issues that I was not aware of. Our programmer told me our build was 'standard'.
But your reply also reminded me of the one more test that I should run. I think I made the post over complicated.
1. I created a clean installer USB, fresh from the archives, for FreeBSD 10.3.
2. When I put that installer USB into the pre-June R740, it boots up to the installer routine, and allows me to create a boot SSD.
3. When I put that SAME USB into the July (updated) R740, it finds the USB, starts to boot, gets to the LOGO, runs a few lines, and crashes with the "Kernel trap12 with interrupts disabled", immediately after the line "ACPI APIC Table: <DELLOE DELLOSE>
I did not see any of those previous pci/em, and so on, errors. (attached image.png)

FreeBSD 10.3, 10.4, 11.0, all fail this test.
FreeBSD 11.1, 11.2, 11.3 all boot properly to the install setup, and work fine on the July (updated) R740.
 

Attachments

  • R740 crash.PNG
    R740 crash.PNG
    1.2 MB · Views: 191
Make sure the machine's BIOS is up to date, it wouldn't be the first machine that got shipped with an old version.

Does the new machine have any different BIOS options with regards to APIC? I can remember a SuperMicro with a new option and enabling that would result in the machine crashing and burning during boot. But I can't remember what it was called.

Edit: I think it was x2APIC. If you have that option, turn it off. The "standard" APIC stuff should be enabled.
 
One other comment. Other than the 'build' date and motherboard update, the systems are identical with CPUs, RAM, PERC, NICs, and drives.
Many previous system OK, the last FIVE all fail.
So far, DELL has not given us any help with what changed on the motherboard. That's why we are trying here.
Thanks again, Dale K
 
DELL documentation confirms a change on the motherboard in the DELL R740xd and DELL R640 in July of 2019, but they will not confirm what the change was.
Well it sounds to me like an ACPI problem with the newer boards.
Have you tried some of the ACPI(4) hints to eliminate that possibility?
From the loader prompt(#3 At the Beastie Menu on startup):
set hint.acpi.0.disabled="1"

the systems are identical with CPUs, RAM, PERC, NICs
In our ACPI manual they mention for ACPI problems first step is update BIOS.
So my question to you is: Are these newer boards using the same BIOS version as the old boards that work?
Maybe you could use the older working BIOS flashed to the newer boards.
Really hacky but might work. Might even brick the motherboard.... So that would be last gasp effort.
ACPI is totally handled through BIOS.
 
APIC 'must' be involved here? So FreeBSD 11.0 fails to boot, (same crash), but FreeBSD 11.1 DOES boot. How do we find out if there was a change in FreeBSD 11.1 to account for a 'possible' change in hardware and how it handles APIC? I cannot find 'any' settings in the BIOS related to APIC.
 
Problem solved! I looked through BIOS for APIC but did not see any settings. It is in Processor settings > way down at the bottom so you will miss it if you don't scroll down. DELL has sent many of these R740, R640, and R440 systems for the last 8 months with that 'x2APIC' setting > DISABLED. The last FIVE came in with it ENABLED. I found it when I went back in looking for ACPI settings. FreeBSD 11.1 and up must turn that OFF during the boot up. FreeBSD 10.3 is up and running!

Thanks for your help!
 
Actually, it looks like FreeBSD 11.1 and newer 'support' x2APIC, so that is why those booted properly with x2APIC 'ENABLED' on the DELL R740 and R640. For now, we are running with FreeBSD 10.3 and x2APIC 'DISABLE', until we transition to 11.3 in the next few months.
 
Back
Top