Yuck - Fatal trap 12 on FreeBSD 8.1-PRERELEASE

I had an emergency and had to clone a running system which is FreeBSD 8.1-PRERELEASE.
Yes, I plan on upgrading, but I am posting to talk about 'Fatal trap 12' errors.

My system is a dhcp/named/gateway box using the neato ATOM D510 cpu... Supermicro X7SPA-H.

First question: the time stamps are the boot time after reboot in /var/log/messages. Where does syslogd pull this 'page fault' message from to be able to write it to disk after reboot? Was it in swap?

Code:
root@pulga 150> bunzip2 -c messages.0.bz2 | grep -B 5 -A 24 Fatal
Oct 14 01:48:43 pulga dhcpd: Abandoning IP address 10.9.12.24: pinged before offer
Oct 14 01:48:44 pulga dhcpd: Abandoning IP address 10.9.12.25: pinged before offer
Oct 14 01:52:38 pulga syslogd: kernel boot file is /boot/kernel/kernel
Oct 14 01:52:38 pulga kernel: 
Oct 14 01:52:38 pulga kernel: 
Oct 14 01:52:38 pulga kernel: Fatal trap 12: page fault while in kernel mode
Oct 14 01:52:38 pulga kernel: cpuid = 2; apic id = 02
Oct 14 01:52:38 pulga kernel: fault virtual address	= 0x4
Oct 14 01:52:38 pulga kernel: fault code		= supervisor read data, page not present
Oct 14 01:52:38 pulga kernel: instruction pointer	= 0x20:0xffffffff80a25e5c
Oct 14 01:52:38 pulga kernel: stack pointer	        = 0x28:0xffffff8074fdf6c0
Oct 14 01:52:38 pulga kernel: frame pointer	        = 0x28:0xffffff8074fdf7c0
Oct 14 01:52:38 pulga kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
Oct 14 01:52:38 pulga kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Oct 14 01:52:38 pulga kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Oct 14 01:52:38 pulga kernel: current process		= 0 (em0 taskq)
Oct 14 01:52:38 pulga kernel: trap number		= 12
Oct 14 01:52:38 pulga kernel: panic: page fault
Oct 14 01:52:38 pulga kernel: cpuid = 2
Oct 14 01:52:38 pulga kernel: Uptime: 41m30s
Oct 14 01:52:38 pulga kernel: Cannot dump. Device not defined or unavailable.
Oct 14 01:52:38 pulga kernel: Automatic reboot in 15 seconds - press a key on the console to abort
Oct 14 01:52:38 pulga kernel: panic: bufwrite: buffer is not busy???
Oct 14 01:52:38 pulga kernel: cpuid = 2
Oct 14 01:52:38 pulga kernel: Copyright (c) 1992-2010 The FreeBSD Project.
Oct 14 01:52:38 pulga kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Oct 14 01:52:38 pulga kernel: The Regents of the University of California. All rights reserved.
Oct 14 01:52:38 pulga kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
Oct 14 01:52:38 pulga kernel: FreeBSD 8.1-PRERELEASE #0: Tue Jul 13 01:09:17 PDT 2010
Oct 14 01:52:38 pulga kernel: root@:/usr/obj/usr/src/sys/JEJEN amd64

root@pulga 151> grep -B 5 -A 16 Fatal messages
Oct 20 09:22:40 pulga named[18444]: clients-per-query decreased to 12
Oct 20 09:31:41 pulga syslogd: kernel boot file is /boot/kernel/kernel
Oct 20 09:31:41 pulga kernel: 
Oct 20 09:31:41 pulga kernel: 
Oct 20 09:31:41 pulga kernel: Fatal trap 12: page fault while in kernel mode
Oct 20 09:31:41 pulga kernel: cpuid = 2; apic id = 02
Oct 20 09:31:41 pulga kernel: fault virtual address	= 0x4
Oct 20 09:31:41 pulga kernel: fault code		= supervisor read data, page not present
Oct 20 09:31:41 pulga kernel: instruction pointer	= 0x20:0xffffffff80a25e5c
Oct 20 09:31:41 pulga kernel: stack pointer	        = 0x28:0xffffff8074fdf6c0
Oct 20 09:31:41 pulga kernel: frame pointer	        = 0x28:0xffffff8074fdf7c0
Oct 20 09:31:41 pulga kernel: code segment		= base 0x0, limit 0xfffff, type 0x1b
Oct 20 09:31:41 pulga kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Oct 20 09:31:41 pulga kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Oct 20 09:31:41 pulga kernel: current process		= 0 (em0 taskq)
Oct 20 09:31:41 pulga kernel: trap number		= 12
Oct 20 09:31:41 pulga kernel: panic: page fault
Oct 20 09:31:41 pulga kernel: cpuid = 2
Oct 20 09:31:41 pulga kernel: Copyright (c) 1992-2010 The FreeBSD Project.
Oct 20 09:31:41 pulga kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Oct 20 09:31:41 pulga kernel: The Regents of the University of California. All rights reserved.

Second question: the pointer addresses are the same from two crashes a week apart. How do I figure out what they point? I read the On-Line Kernel Debugging Using Remote GDB chapter of the manual, but didn't see an example for extracting pointer addresses... I seem to remember doing this a few years ago. Help! :)
 
I upgraded to 8-STABLE. Hopefully the auto-reboots go away. If anyone can tell me how to figure out where those pointers point, that would be useful... I have the old kernel binary, but it does not have GDB or KDB built into it.
 
I am getting Sig 12's on another router as well. :( This one is

Code:
FreeBSD jejen.monkeybrains.net 8.1-STABLE FreeBSD 8.1-STABLE #2: Wed Oct 20 15:55:41 PDT 2010     
[email]root@build.monkeybrains.net[/email]:/usr/obj/usr/src/sys/JEJEN  amd64

Code:
Oct 23 08:15:27 jejen kernel: kernel trap 12 with interrupts disabled
Oct 23 08:15:27 jejen kernel: 
Oct 23 08:15:27 jejen kernel: 
Oct 23 08:15:27 jejen kernel: Fatal trap 12: page fault while in kernel mode
Oct 23 08:15:27 jejen kernel: cpuid = 2; apic id = 02
Oct 23 08:15:27 jejen kernel: fault virtual address     = 0xc
Oct 23 08:15:27 jejen kernel: fault code                = supervisor read data, page not present
Oct 23 08:15:27 jejen kernel: instruction pointer       = 0x20:0xffffffff80365e60
Oct 23 08:15:27 jejen kernel: stack pointer             = 0x28:0xffffff800012a820
Oct 23 08:15:27 jejen kernel: frame pointer             = 0x28:0xffffff800012a850
Oct 23 08:15:27 jejen kernel: code segment              = base 0x0, limit 0xfffff, type 0x1b
Oct 23 08:15:27 jejen kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Oct 23 08:15:27 jejen kernel: processor eflags  = resume, IOPL = 0
Oct 23 08:15:27 jejen kernel: current process           = 0 (em0 taskq)
Oct 23 08:15:27 jejen kernel: trap number               = 12
Oct 23 08:15:27 jejen kernel: panic: page fault
Oct 23 08:15:27 jejen kernel: cpuid = 2
Oct 23 08:15:27 jejen kernel: Copyright (c) 1992-2010 The FreeBSD Project.
Oct 23 08:15:27 jejen kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Oct 23 08:15:27 jejen kernel: The Regents of the University of California. All rights reserved.
Oct 23 08:15:27 jejen kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
Oct 23 08:15:27 jejen kernel: FreeBSD 8.1-STABLE #2: Wed Oct 20 15:55:41 PDT 2010
Oct 23 08:15:27 jejen kernel: root@pulga.monkeybrains.net:/usr/obj/usr/src/sys/JEJEN amd64
Oct 23 08:15:27 jejen kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Oct 23 08:15:27 jejen kernel: CPU: Intel(R) Atom(TM) CPU D510   @ 1.66GHz (1666.67-MHz K8-class CPU)
Oct 23 08:15:27 jejen kernel: Origin = "GenuineIntel"  Id = 0x106ca  Family = 6  Model = 1c  Stepping = 10
Oct 23 08:15:27 jejen kernel: 
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Oct 23 08:15:27 jejen kernel: Features2=0x40e31d<SSE3,DTES64,MON,DS_CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE>
Oct 23 08:15:27 jejen kernel: AMD Features=0x20100800<SYSCALL,NX,LM>
Oct 23 08:15:27 jejen kernel: AMD Features2=0x1<LAHF>
Oct 23 08:15:27 jejen kernel: TSC: P-state invariant
Help!?!
 
8.1-STABLE - fatal trap 12

So, it looks like not memory according to this:
http://forums.freebsd.org/showthread.php?t=6051#4
(Memory is usually Signal 11 in my experience)

I have multiple machines running 8.1-STABLE that are seeing 'Fatal Trap 12'. Both are the same hardware: Supermicro with ATOM D510 CPU / two onboard nics [em0/em1] / solid state SATA devive / Ram that has passed memtest. Same kernel config diff from GENERIC:
Code:
--- GENERIC     2010-08-11 00:11:20.000000000 -0700
+++ JEJEN       2010-10-20 15:38:34.000000000 -0700
@@ -1,34 +1,9 @@

+####
 cpu            HAMMER
-ident          GENERIC
+ident          JEJEN
 
+options         IPFIREWALL
+options         IPFIREWALL_VERBOSE
+options         IPFIREWALL_VERBOSE_LIMIT=100
+options         IPFIREWALL_DEFAULT_TO_ACCEPT
+options         IPFIREWALL_FORWARD
+#options        IPFIREWALL_NAT
+options         DUMMYNET
+options         HZ=1000
+#options         GEOM_MIRROR
+
+### QoS support
+options ALTQ
+options ALTQ_CBQ
+options ALTQ_RED
+options ALTQ_RIO
+options ALTQ_HFSC
+options ALTQ_CDNR
+options ALTQ_PRIQ

 makeoptions    DEBUG=-g                # Build kernel with gdb(1) debug symbols
 
@@ -47,18 +22,13 @@
 options        NFSSERVER               # Network Filesystem Server
 options        NFSLOCKD                # Network Lock Manager
 options        NFS_ROOT                # NFS usable as /, requires NFSCLIENT
-options        MSDOSFS                 # MSDOS Filesystem
 options        CD9660                  # ISO 9660 Filesystem
 options        PROCFS                  # Process filesystem (requires PSEUDOFS)
 options        PSEUDOFS                # Pseudo-filesystem framework
 options        GEOM_PART_GPT           # GUID Partition Tables.
 options        GEOM_LABEL              # Provides labelization
 options        COMPAT_43TTY            # BSD 4.3 TTY compat (sgtty)
-options        COMPAT_FREEBSD32        # Compatible with i386 binaries
-options        COMPAT_FREEBSD4         # Compatible with FreeBSD4
-options        COMPAT_FREEBSD5         # Compatible with FreeBSD5
-options        COMPAT_FREEBSD6         # Compatible with FreeBSD6
-options        COMPAT_FREEBSD7         # Compatible with FreeBSD7
 options        SCSI_DELAY=5000         # Delay (in ms) before probing SCSI
 options        KTRACE                  # ktrace(1) support
 options        STACK                   # stack(9) support

@@ -87,72 +57,19 @@
 device         acpi
 device         pci
 
-# Floppy drives
-device         fdc
-
 # ATA and ATAPI devices
 device         ata
 device         atadisk         # ATA disk drives
 device         ataraid         # ATA RAID drives
 device         atapicd         # ATAPI CDROM drives
-device         atapifd         # ATAPI floppy drives
-device         atapist         # ATAPI tape drives
 options        ATA_STATIC_ID   # Static device numbering
 
-# SCSI Controllers
[snip - removed]

 # SCSI peripherals
 device         scbus           # SCSI bus (required for SCSI)
-device         ch              # SCSI media changers
 device         da              # Direct Access (disks)
-device         sa              # Sequential Access (tape etc)
-device         cd              # CD
 device         pass            # Passthrough device (direct SCSI access)
 device         ses             # SCSI Environmental Services (and SAF-TE)
 
-# RAID controllers interfaced to the SCSI subsystem
[snip - removed]

-# RAID controllers
[snip - removed]

 # atkbdc0 controls both the keyboard and the PS/2 mouse
 device         atkbdc          # AT keyboard controller
 device         atkbd           # AT keyboard
@@ -169,99 +86,16 @@
 
 device         agp             # support several AGP chipsets
 
-# PCCARD (PCMCIA) support
-# PCMCIA and cardbus bridge support
[snip - removed]

 # Serial (COM) ports
 device         uart            # Generic UART driver
 
-# Parallel port
[snip - removed]

 # PCI Ethernet NICs.
 device         em              # Intel PRO/1000 Gigabit Ethernet Family
 device         igb             # Intel PRO/1000 PCIE Server Gigabit Family
 device         ixgbe           # Intel PRO/10GbE PCIE Ethernet Family

 # PCI Ethernet NICs that use the common MII bus controller code.
 # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
 device         miibus          # MII bus support
[snip -- removed rest of NICs]
-# Wireless NIC cards
[snip -- all removed]
 
 # Pseudo devices.
 device         loop            # Network loopback
@@ -287,39 +121,31 @@
 device         ehci            # EHCI PCI->USB interface (USB 2.0)
 device         usb             # USB Bus (required)
 #device                udbp            # USB Double Bulk Pipe devices
-device         uhid            # "Human Interface Devices"
 device         ukbd            # Keyboard
 device         ulpt            # Printer
 device         umass           # Disks/Mass storage - Requires scbus and da
-device         ums             # Mouse
-device         urio            # Diamond Rio 500 MP3 player
-# USB Serial devices
[snip - removed all USB]

-# FireWire support
[snip - removed all firewire]
 
Are other people seeing similar 'Fatal Trap 12' on 8.1-STABLE boxes? Mine have been stable for the past 3 weeks.

Answering my own unanswered plea for help... 'tell me how to figure out where those pointers point', I searched around and found this guide http://www.unixguide.net/freebsd/faq/18.13.shtml

... crash from one box points to propagate_priority ...
Code:
# nm -n /boot/kernel/kernel | grep ffffffff80365
ffffffff80365010 T taskqueue_block
ffffffff80365110 T taskqueue_enqueue
ffffffff80365360 T taskqueue_enqueue_fast
ffffffff80365370 T userret
ffffffff803653f0 T ast
ffffffff80365900 t turnstile_setowner
ffffffff80365950 t turnstile_first_waiter
ffffffff80365980 T turnstile_head
ffffffff803659a0 T turnstile_empty
ffffffff803659c0 t turnstile_fini
ffffffff803659d0 t turnstile_init
ffffffff80365a50 T init_turnstiles
ffffffff80365af0 t turnstile_adjust_thread
ffffffff80365cc0 t propagate_priority
ffffffff80365e90 T turnstile_adjust
ffffffff80365f00 T turnstile_free
ffffffff80365f20 T turnstile_alloc
ffffffff80365f40 t init_turnstile0
ffffffff80365fa0 T turnstile_chain_unlock
ffffffff80365fe0 T turnstile_lookup
 
The last years I have had similar problems on 3 other machines with different hardware configurations (pretty much the same system setup). In my case it seems to be related to the kernel and network/NICs when using VLAN tags. I have 24 VLANs.

I often get on *all* machines a kernel trap 12 panic, similar with the references you have. In my case it's not only related to em0, but also occurs independent on what kind of NIC/driver is being used;

Code:
kernel: current process           = 0 ([nic] taskq)

NIC: Same process, but with nfe0, eu0 etc. It doesn't matter.

This problem occurs at various times, the newest installation (OS and hardware only last approx. 30 seconds to a few minutes before the crash). I've tested both FreeBSD7/i386 and FreeBSD8/i386 (8.1-RELEASE + 8.2-PRERELEASE). If I don't have any traffic on the VLANs, the error doesn't seem to occur.

The curious thing is, that my oldest box (a FreeBSD 5.5), the problem NEVER occurs.. I will investigate a bit further. I doubt it's related to the hardware anymore, but rather something in the kernel itself..
 
Back
Top