Hi,
On one of our FreeBSD (7.0 amd64) Apache (2.2.14) servers we are getting regular crashes, especially when the load is high. The only additional information i have is the apache modules in use and the core-dump of the system.
I'm at a loss of how to move on except reinstalling the entire system, which is not a option at the moment given that we don't have any redundancy.
Could anyone please help me move the diagnose process forward?
Below is the output of a basic kdbg session. While I have written som C code in my days I am not familiar with gdb, kernel programming or the syscalls involved, so go easy om me
.
On one of our FreeBSD (7.0 amd64) Apache (2.2.14) servers we are getting regular crashes, especially when the load is high. The only additional information i have is the apache modules in use and the core-dump of the system.
I'm at a loss of how to move on except reinstalling the entire system, which is not a option at the moment given that we don't have any redundancy.
Could anyone please help me move the diagnose process forward?
Below is the output of a basic kdbg session. While I have written som C code in my days I am not familiar with gdb, kernel programming or the syscalls involved, so go easy om me

Code:
[root@www2 /opt/crash]# ls -alh
total 3285792
drwx------ 2 root wheel 512B May 19 17:20 .
drwxr-xr-x 10 root wheel 512B Feb 9 15:38 ..
-rw-r--r-- 1 root wheel 2B May 19 17:20 bounds
-rw------- 1 root wheel 436B Feb 15 14:42 info.0
-rw------- 1 root wheel 436B May 2 00:02 info.1
-rw------- 1 root wheel 436B May 13 11:39 info.2
-rw------- 1 root wheel 436B May 14 17:31 info.3
-rw------- 1 root wheel 435B May 19 17:20 info.4
-rw------- 1 root wheel 684M Feb 15 14:43 vmcore.0
-rw------- 1 root wheel 731M May 2 00:03 vmcore.1
-rw------- 1 root wheel 689M May 13 11:40 vmcore.2
-rw------- 1 root wheel 633M May 14 17:32 vmcore.3
-rw------- 1 root wheel 656M May 19 17:21 vmcore.4
[root@www2 /opt/crash]# cat info.4
Dump header from device /dev/da0s1b
Architecture: amd64
Architecture Version: 2
Dump Length: 687824896B (655 MB)
Blocksize: 512
Dumptime: Wed May 19 17:18:09 2010
Hostname: www2.anonymous.se
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 7.0-RELEASE #0: Fri Oct 10 12:24:42 UTC 2008
drift@www2.anonymous.se:/usr/obj/usr/src/sys/KERNCONF
Panic String: page fault
Dump Parity: 779422760
Bounds: 4
Dump Status: good
[root@www2 /opt/crash]# cd /usr/obj/usr/src/sys/KERNCONF
[root@www2 /usr/obj/usr/src/sys/KERNCONF]# kgdb kernel.debug /opt/crash/vmcore.4
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".
Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x8:0xffffffff80468d5f
stack pointer = 0x10:0xffffffffb4eb09c0
frame pointer = 0x10:0xffffff00182ca000
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = resume, IOPL = 0
current process = 64033 (httpd)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 16h26m40s
Physical memory: 8178 MB
Dumping 655 MB: 640 624 608 592 576 560 544 528 512 496 480 464 448 432 416 400 384 368 352 336 320 304 288 272 256 240 224 208 192 176 160 144 128 112 96 80 64 48 32 16
#0 doadump () at pcpu.h:194
194 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0 doadump () at pcpu.h:194
#1 0x0000000000000004 in ?? ()
#2 0xffffffff804775f9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#3 0xffffffff804779fd in panic (fmt=0x104 <Address 0x104 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:563
#4 0xffffffff807305c4 in trap_fatal (frame=0xffffff00612e8000, eva=18446742975826784256) at /usr/src/sys/amd64/amd64/trap.c:724
#5 0xffffffff8073123f in trap (frame=0xffffffffb4eb0910) at /usr/src/sys/amd64/amd64/trap.c:251
#6 0xffffffff80716f3e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169
#7 0xffffffff80468d5f in lf_advlock (ap=Variable "ap" is not available.
) at /usr/src/sys/kern/kern_lockf.c:294
#8 0xffffffff8044ebbb in kern_fcntl (td=0xffffff00612e8000, fd=Variable "fd" is not available.
) at vnode_if.h:1036
#9 0xffffffff8044ef7f in fcntl (td=0xffffff00612e8000, uap=0xffffffffb4eb0be0) at /usr/src/sys/kern/kern_descrip.c:336
#10 0xffffffff80730c17 in syscall (frame=0xffffffffb4eb0c70) at /usr/src/sys/amd64/amd64/trap.c:852
#11 0xffffffff8071714b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290
#12 0x000000080112f07c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) list *0xffffffff80468d5f
0xffffffff80468d5f is in lf_advlock (/usr/src/sys/kern/kern_lockf.c:295).
290 (td->td_wmesg == lockstr) &&
291 (i++ < maxlockdepth)) {
292 waitblock = (struct lockf *)td->td_wchan;
293 /* Get the owner of the blocking lock */
294 waitblock = waitblock->lf_next;
295 if ((waitblock->lf_flags & F_POSIX) == 0)
296 break;
297 nproc = (struct proc *)waitblock->lf_id;
298 if (nproc == (struct proc *)lock->lf_id) {
299 PROC_SUNLOCK(wproc);
(kgdb) q
[root@www2 /usr/obj/usr/src/sys/KERNCONF]#