9.0 kernel fatal trap when detecting CPU, SMP issue?

I'm building a new system using the amd64 FreeBSD version and have been receiving kernel fatal traps shortly after passing by the boot loader menu. Been trying to figure this out and have only gotten so far and could use some help. My system hardware is:

  • MSI P67A-GD65 (B3) mainboard
  • intel core i5-2500K (quad core)
  • G.SKILL Ripjaws X Series 8GB (2 x 4GB) DDR3 1600

The system boots just past the boot loader menu and pauses on the line
Code:
ACPI APIC Table: <ALASKA A M I>
The couple times it has booted the next few lines should be about the processor cores but normally it is just a non stop blur of kernel fatal trap messages. The only way I have been able to load the system is escape to the loader prompt and type [cmd=]set kern.smp.disabled=1[/cmd] System boots and runs without problems (other then it is not using all the cores).

From searching for help I've done the following suggestions: ran memtest (12 hours no errors), updated BIOS to latest and tried setting up for crash dumps in rc.conf. Unfortunately this happening before rc.conf is read so I added the lines
Code:
options DDB
options GDB
to the generic kernel. I've been able to get the following messages but been hitting a brick wall on trying to get a dump and solve this.

On bootup I'm now getting the following:

Code:
real memory = 8589934592 (8192 MB)
avail memory = 8198332416 (7818 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I>
panic: AP #2 (PHY# 4) failed!
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at KDB_backtrace+0x37
panic() at panic+0x187
cpu_mp_start() at cpu_mp_start+0x589
mp_start() at mp_start+0x85
mi_startup() at mi_startup+0x77
btext() at btext+0x2c
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at	kdb_enter+0x3b: movq	$0,0x905dc2(%rip)
db>

Then I typed the following commands at the db prompt:

Code:
db> s
[ thread pid 0 tid 0 ]
Stopped at	kdb_enter+0x46: addq	$0x8,%rsp
db> c
kernel trap 12 with interrupts disabled

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff808316e8
stack pointer           = 0x28:0xffffffff81412ab0
frame pointer           = 0x28:0xffffffff81412af0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = trace trap, resume, IOPL = 0
current process         = 0 ()
[ thread pid 0 tid 0 ]
Stopped at	_thread_lock_flags+0x28:	movq	0x18(%r12),%rax


Code:
db> trace
Tracing pid 0 tid 0 td 0xffffffff8112f830
_thread_lock_flags() at _thread_lock_flags+0x28
kern_reboot() at kern_reboot+0x33
panic() at panic+0x171
cpu_mp_start() at cpu_mp_start+0x589
mp_start() at mp_start+0x85
mi_startup() at mi_startup+0x77
btext() at btext+0x2c


Code:
db> panic
panic: panic: from debuggerfrom debugger

cupid = 0
cupid = 0

Then if I type gdb I get
Code:
The remote GDB backend could not be selected.

From what I read panic should create a crash dump but on reboot there is nothing there. To try and figure this out I installed a fresh copy of 9.0 release. The only things modified right now are rc.conf has the two lines:
Code:
dumpdev="AUTO"
dumpdir="/var/crash"
and I grabbed the latest 9.0 release source and built a kernel with only the two previously mentioned lines added.

Thanks in advance for any help.
 
Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
Is almost always caused by bad memory.
 
I had thought that at first but I've swapped the memory out and still have it consistently error at the same spot. I had the BIOS do a memory test and ran memtest86+ on all the memory and both reported no errors.

The other thing I am wondering is if there is something with freebsd FreeBSD and the MSI mainboard? Either settings in the BIOS or loader.conf options?
 
SirDice said:
Code:
Fatal trap 12: page fault while in kernel mode
Is almost always caused by bad memory.
Agreed, if it happens during normal operation. The original poster seems to have triggered it by giving a continue command inside the debugger, which was entered from a panic. In that case, all bets are off...

To the original poster - you won't get a crash dump as you're at the very beginning of kernel operation, before most device probes. Therefore, the kernel hasn't learned about any disks yet - no place to put the dump.

The actual fault seems to come from the vicinity of line 947 in sys/amd64/amd64/mp_machdep.c (that's an 8.3-PRERELEASE line number, 9.0-RELEASE will probably differ somewhat). It is the result of a call to start_ap() returning an error. start_ap() does quite a "song and dance" (quoted from the code comments). The error return is due to start_ap() doing everything it thinks it needs to do to get the next core going, but not seeing the new processor start within 5 seconds.

The first thing I'd suggest is seeing if you can successfully boot from the 9.0 distribution media (CD, DVD, USB - your choice). If you can, I'd suspect something strange in either your loader.conf or a mis-compiled local kernel.

If the distribution media does the same thing, then you're going to need to get a kernel developer involved. I'd start with the freebsd-stable@ mailing list and see where it goes from there.

Footnote - mp_machdep.c is also home to one of the more amusing panic messages:

Code:
                panic("cpuid mismatch! boom!!");
 
The DVD media is causing the same issue. The only way I was able to install from the DVD originally was disable SMP at the boot loader menu. Thanks taking the time to track down where the problem appears to be coming from. Guess I'll take it to the next level and post to the mailing list.
 
I received a private message about whether I found a solution to this problem. So I thought I'd post a followup about my resolution.

I joined the FreeBSD-stable mailing list as suggested and talked to a developer there. He informed me that my problem was caused by a bug in the motherboard's BIOS, not anything within FreeBSD. Eventually MSI released a BIOS update on 2012-06-08 (version 4.1). This also updated the Intel Management Engine firmware from 7 to 8. I cannot say whether the problem was in the BIOS or Management Engine firmware but either way the update fixed my problem.
 
Back
Top