How do I find out the cause of a system crash?

My home NAS crashes intermittently. I've enabled crash dumps but don't know how to interpret them. Here are the first lines of the core.txt and a link to the whole file.

Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0x160
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff816bbc76
stack pointer	        = 0x28:0xffffff810bf0c870
frame pointer	        = 0x28:0xffffff810bf0c930
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 3786 (python2.7)
trap number		= 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff809208a6 at kdb_backtrace+0x66
#1 0xffffffff808ea8be at panic+0x1ce
#2 0xffffffff80bd8240 at trap_fatal+0x290
#3 0xffffffff80bd857d at trap_pfault+0x1ed
#4 0xffffffff80bd8b9e at trap+0x3ce
#5 0xffffffff80bc315f at calltrap+0x8
#6 0xffffffff80c68504 at VOP_REMOVE_APV+0x34
#7 0xffffffff8098709d at kern_unlinkat+0x32d
#8 0xffffffff80bd7ae6 at amd64_syscall+0x546
#9 0xffffffff80bc3447 at Xfast_syscall+0xf7
Uptime: 11h3m42s
Dumping 1542 out of 3818 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91%

http://pastebin.com/CY7UR3KM

Any help appreciated.

/Stuart
 
Code:
Fatal trap 12: page fault while in kernel mode
These are almost always caused by failing hardware.
 
Try to run memtest86 and see if one the RAM modules produces errors. Just a guess to narrow it down.

There are some posts here in this forum mentioning "Fatal trap 12".
Maybe there's a hint for you.
 
Briefly, this has caused trap
Code:
#7  0xffffffff816bbc76 in zfs_freebsd_remove ([B]ap=Variable "ap" is not available[/B].)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1855
The function zfs_freebsd_remove() was called with an inappropriate pointer to a structure vop_remove_args{} (ap) which address is too small. (Most likely, the value is just happen to be there, so no particular meaning.)
Code:
fault virtual address	= 0x160

If you trace back, one of arguments, path, in a function, kern_unlinkat(), looks suspicious.
Code:
#9  0xffffffff8098709d in kern_unlinkat (td=0xfffffe005f799470, fd=-100, 
    [B]path=0x8014d5c00 <Address 0x8014d5c00 out of bounds>[/B], 
    pathseg=UIO_USERSPACE, oldinum=0) at vnode_if.h:575

#6 -> #0 are standard trap processes. (Things have taken place from the bottom to the top of the back trace.)

Though, I'm not familiar with file system. I don't know exactly what the cause.
 
Given it is a NAS box and likely spending most of its time in ZFS related code, I wouldn't be too fast to jump on ZFS as the problem until the hardware has been tested and confirmed OK. The fact that it crashed in ZFS code is as likely due to chance as anything else, given the job the box is doing.
 
Back
Top