Home server crashing regularly: ffs_alloccg: map corrupted

I am not a kernel debugger so, I'm not sure what any of this means:

Code:
gollum# kgdb kernel.debug /var/crash/vmcore.4
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:
start = 0, len = 3723, fs = /usr
panic: ffs_alloccg: map corrupted
Uptime: 2d3h47m41s
Physical memory: 307 MB
Dumping 92 MB: 77 61 45 29 13

Reading symbols from /boot/kernel/acpi.ko...Reading symbols from /boot/kernel/acpi.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/acpi.ko
#0  doadump () at pcpu.h:196
196             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:196
#1  0xc075f283 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc075f48e in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc094fc01 in ffs_mapsearch (fs=0xc2662000, cgp=0xc8bcc000, bpref=8120656,
    allocsiz=1) at /usr/src/sys/ufs/ffs/ffs_alloc.c:2190
#4  0xc09531d0 in ffs_alloccg (ip=0xc3694174, cg=86, bpref=8120656, size=2048)
    at /usr/src/sys/ufs/ffs/ffs_alloc.c:1485
#5  0xc094d8da in ffs_hashalloc (ip=0xc3694174, cg=86, pref=Unhandled dwarf expression opcode 0x93
)
    at /usr/src/sys/ufs/ffs/ffs_alloc.c:1293
#6  0xc095010b in ffs_alloc (ip=0xc3694174, lbn=1, bpref=8120656, size=2048,
    flags=100728832, cred=0xc27d6000, bnp=0xcf6089ec)
    at /usr/src/sys/ufs/ffs/ffs_alloc.c:185
#7  0xc0955cae in ffs_balloc_ufs2 (vp=0xc335133c, startoffset=Variable "startoffset" is not available.
)
    at /usr/src/sys/ufs/ffs/ffs_balloc.c:709
#8  0xc09745c5 in ffs_write (ap=0xcf608bc4)
    at /usr/src/sys/ufs/ffs/ffs_vnops.c:724
#9  0xc0a647d6 in VOP_WRITE_APV (vop=0xc0b9fe80, a=0xcf608bc4)
    at vnode_if.c:691
#10 0xc07e94d7 in vn_write (fp=0xc29e404c, uio=0xcf608c60,
    active_cred=0xc27d6000, flags=0, td=0xc255f690) at vnode_if.h:373
#11 0xc0793e67 in dofilewrite (td=0xc255f690, fd=28, fp=0xc29e404c,
    auio=0xcf608c60, offset=-1, flags=0) at file.h:257
#12 0xc0794138 in kern_writev (td=0xc255f690, fd=28, auio=0xcf608c60)
    at /usr/src/sys/kern/sys_generic.c:402
#13 0xc07941af in write (td=0xc255f690, uap=0xcf608cfc)
    at /usr/src/sys/kern/sys_generic.c:318
#14 0xc0a4f1a5 in syscall (frame=0xcf608d38)
    at /usr/src/sys/i386/i386/trap.c:1090
#15 0xc0a3bd70 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#16 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

dmesg output:
Code:
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.2-RELEASE #0: Fri May  8 02:02:23 MDT 2009
    root@gollum.myhome.network:/usr/obj/usr/src/sys/GOLLUM
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel Celeron (634.78-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3
  Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory  = 335478784 (319 MB)
avail memory = 314527744 (299 MB)
kbd1 at kbdmux0
acpi0: <ABIT AWRDACPI> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 13ef0000 (3) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff,0x4000-0x4041,0x5000-0x500f on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <Intel 82443BX (440 BX) host to PCI bridge> on hostb0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
vgapci0: <VGA-compatible display> mem 0xd4000000-0xd4ffffff,0xd8000000-0xd8ffffff irq 10 at device 0.0 on pci1
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 UDMA33 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
uhci0: <Intel 82371AB/EB (PIIX4) USB controller> port 0xe000-0xe01f at device 7.2 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <Intel 82371AB/EB (PIIX4) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
pci0: <bridge> at device 7.3 (no driver attached)
pcib2: <PCI-PCI bridge> at device 9.0 on pci0
pci2: <PCI bus> on pcib2
de0: <Digital 21140A Fast Ethernet> port 0xd000-0xd07f mem 0xd7000000-0xd700007f irq 12 at device 4.0 on pci2
de0: SMC 9332BDT 21140A [10-100Mb/s] pass 2.2
de0: WARNING: using obsoleted if_watchdog interface
de0: Ethernet address: 00:e0:29:4d:be:03
de0: [ITHREAD]
de1: <Digital 21140A Fast Ethernet> port 0xd400-0xd47f mem 0xd7001000-0xd700107f irq 10 at device 5.0 on pci2
de1: SMC 9332BDT 21140A [10-100Mb/s] pass 2.2
de1: WARNING: using obsoleted if_watchdog interface
de1: Ethernet address: 00:e0:29:4d:be:04
de1: [ITHREAD]
atapci1: <Promise PDC20371 SATA150 controller> port 0xe400-0xe43f,0xe800-0xe80f,0xec00-0xec7f mem 0xda020000-0xda020fff,0xda000000-0xda01ffff irq 11 at device 13.0 on pci0
atapci1: [ITHREAD]
atapci1: [ITHREAD]
ata2: <ATA channel 0> on atapci1
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci1
ata3: [ITHREAD]
ata4: <ATA channel 2> on atapci1
ata4: [ITHREAD]
fdc0: <floppy drive controller> port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
cpu0: <ACPI CPU> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
smist0: <SpeedStep SMI> on cpu0
device_attach: smist0 attach returned 6
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xc0000-0xc87ff pnpid ORM0000 on isa0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
ppbus0: [ITHREAD]
plip0: <PLIP network interface> on ppbus0
plip0: WARNING: using obsoleted IFF_NEEDSGIANT flag
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 634775517 Hz quality 800
Timecounters tick every 1.000 msec
acd0: CDROM <ATAPI 52X CDROM/VER-1.40> at ata0-master UDMA33
ad4: 76324MB <Maxtor 6Y080M0 YAR51BW0> at ata2-master SATA150
ad6: 238475MB <Seagate ST3250410AS 3.AAC> at ata3-master SATA150
ar0: 76293MB <Promise Fasttrak RAID1> status: READY
ar0: disk0 READY (master) using ad4 at ata2-master
ar0: disk1 READY (mirror) using ad6 at ata3-master
GEOM_LABEL: Label for provider ad4s1a is ufsid/4411dfaab1b88dfa.
GEOM_LABEL: Label for provider ad4s1d is ufsid/4411dfaad32c934f.
Trying to mount root from ufs:/dev/ar0s1a
WARNING: / was not properly dismounted

I upgraded this server from FreeBSD 7.1 to 7.2 on May 8th. The crashes first started occurring on May 28th.

Thoughts? (Or at least a more appropriate location to send this?)
 
It looks like it's on the /usr filesystem. How did you format that partition?

You could try running something like sysutils/smartmontools and make sure the drive itself is still healthy.
 
SirDice said:
It looks like it's on the /usr filesystem. How did you format that partition?

You could try running something like sysutils/smartmontools and make sure the drive itself is still healthy.

I believe I originally installed this server as FreeBSD 6.2 timeframe. /usr would have been given whatever defaults sysinstall used at those times. The drive itself 'ar0' is actually a RAID1 mirror:

Code:
atapci1: <Promise PDC20371 SATA150 controller> port 0xe400-0xe43f,0xe800-0xe80f,0xec00-0xec7f mem 0xda020000-0xda020fff,0xda000000-0xda01ffff irq 11 at device 13.0 on pci0
atapci1: [ITHREAD]
atapci1: [ITHREAD]

ar0: 76293MB <Promise Fasttrak RAID1> status: READY
ar0: disk0 READY (master) using ad4 at ata2-master
ar0: disk1 READY (mirror) using ad6 at ata3-master

I installed smartmontools and ran smartctl -a on the two drives (ad4 and ad6) both returned their assessment as "PASSED". I don't dare try self-tests (-t) fearing the RAID controller might react badly too it.
 
The self-tests can be run without problems. I have smartmon configured to do a daily short test and a weekly long one.

It's the mirror also still healthy?
 
SirDice said:
The self-tests can be run without problems. I have smartmon configured to do a daily short test and a weekly long one.

Short tests of the two drives ad4 and ad6 completed without any errors. Lifetime is above 11,000 for both drives. I'm running the long just in case.

It's the mirror also still healthy?

FreeBSD believes it is:

Code:
atacontrol status ar0
ar0: ATA RAID1 status: READY
 subdisks:
   0 ad4  ONLINE
   1 ad6  ONLINE
 
The only things I could find regarding this error was a reference to blocksize (hence the question on how it was formatted).

Anything else mainly pointed to hardware or gave no real solution.

But we can pretty much rule out the drives themselves and besides fsck'ing the filesystem I really don't know what else to suggest.
 
Back
Top