Fatal 12 question... help debugging system

I've got a box acting as a gateway/router/nat/dhcp box with two on-board em devices. Any tips on this Fatal trap 12 that keeps popping up? The instruction pointer is always the same (and nm doesn't reveal anything near that address...)

I tried swapping out all the hardware but the SSD. The problem persisted (was FreeBSD 8.2 and 8.3-RC2) so I swapped out the SSD and bumped it to FreeBSD 9.0 -- still issues. Odd thing is the box worked fine for months.

Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x4
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c15e5c
stack pointer           = 0x28:0xffffff800012c640
frame pointer           = 0x28:0xffffff800012c740
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq256: em0:rx 0)
trap number             = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff803d1dd0 at kdb_backtrace+0x60
#1 0xffffffff803a10a4 at panic+0x1b4
#2 0xffffffff8063332d at trap_fatal+0x39d
#3 0xffffffff80633403 at trap_pfault+0xb3
#4 0xffffffff80633a04 at trap+0x3e4
#5 0xffffffff8061afd4 at calltrap+0x8
#6 0xffffffff80c195d2 at fr_checknatin+0x382
#7 0xffffffff80c332b3 at fr_check+0x793
#8 0xffffffff804589bc at pfil_run_hooks+0x7c
#9 0xffffffff804b6284 at ip_input+0x224
#10 0xffffffff80457aad at netisr_dispatch_src+0x7d
#11 0xffffffff80457d21 at netisr_dispatch+0x11
#12 0xffffffff8044e809 at ether_demux+0x169
#13 0xffffffff8044ed84 at ether_input+0x324
#14 0xffffffff8025eb73 at em_rxeof+0x353
#15 0xffffffff8025ed47 at em_msix_rx+0x27
#16 0xffffffff80378c89 at intr_event_execute_handlers+0x129
#17 0xffffffff8037a39a at ithread_loop+0x7a
Uptime: 16m20s
Dumping 320 out of 2028 MB:..5%..15%..25%..35%..45%..55%..65%..75%..85%..95%

Reading symbols from /boot/kernel/ipl.ko...Reading symbols from /boot/kernel/ipl.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ipl.ko
#0  doadump () at pcpu.h:224
224     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump () at pcpu.h:224
#1  0xffffffff803a0dcc in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:441
#2  0xffffffff803a10df in panic (fmt=Variable "fmt" is not available.
)
    at /usr/src/sys/kern/kern_shutdown.c:614
#3  0xffffffff8063332d in trap_fatal (frame=0xffffff800012c590, eva=4)
    at /usr/src/sys/amd64/amd64/trap.c:825
#4  0xffffffff80633403 in trap_pfault (frame=0xffffff800012c590, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:692
#5  0xffffffff80633a04 in trap (frame=0xffffff800012c590)
    at /usr/src/sys/amd64/amd64/trap.c:478
#6  0xffffffff8061afd4 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#7  0xffffffff80c15e5c in nat_new (fin=0xffffff800012c7f0, np=Variable "np" is not available.
)
    at /usr/src/sys/modules/ipfilter/../../contrib/ipfilter/netinet/ip_nat.c:2610
#8  0xffffffff80c195d2 in fr_checknatin (fin=0xffffff800012c7f0, 
    passp=0xffffff800012c7ec)
    at /usr/src/sys/modules/ipfilter/../../contrib/ipfilter/netinet/ip_nat.c:4155
#9  0xffffffff80c332b3 in fr_check (ip=0xffffff0001f8380e, hlen=20, ifp=Variable "ifp" is not available.
)
    at /usr/src/sys/modules/ipfilter/../../contrib/ipfilter/netinet/fil.c:2572
#10 0xffffffff804589bc in pfil_run_hooks (ph=Variable "ph" is not available.
) at /usr/src/sys/net/pfil.c:82
#11 0xffffffff804b6284 in ip_input (m=0xffffff0001f4f900)
    at /usr/src/sys/netinet/ip_input.c:532
#12 0xffffffff80457aad in netisr_dispatch_src (proto=1, source=Variable "source" is not available.
)
    at /usr/src/sys/net/netisr.c:859
#13 0xffffffff80457d21 in netisr_dispatch (proto=Variable "proto" is not available.
)
    at /usr/src/sys/net/netisr.c:946
#14 0xffffffff8044e809 in ether_demux (ifp=0xffffff000178c000, 
    m=0xffffff0001f4f900) at /usr/src/sys/net/if_ethersubr.c:894
#15 0xffffffff8044ed84 in ether_input (ifp=0xffffff000178c000, 
    m=0xffffff0001f4f900) at /usr/src/sys/net/if_ethersubr.c:753
#16 0xffffffff8025eb73 in em_rxeof (rxr=0xffffff0001733800, count=199, 
    done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4311
#17 0xffffffff8025ed47 in em_msix_rx (arg=Variable "arg" is not available.
)
    at /usr/src/sys/dev/e1000/if_em.c:1548
#18 0xffffffff80378c89 in intr_event_execute_handlers (p=Variable "p" is not available.
)
    at /usr/src/sys/kern/kern_intr.c:1216
#19 0xffffffff8037a39a in ithread_loop (arg=Variable "arg" is not available.
)
    at /usr/src/sys/kern/kern_intr.c:1229
#20 0xffffffff80375f3c in fork_exit (
    callout=0xffffffff8037a320 <ithread_loop>, arg=0xffffff00017457c0, 
    frame=0xffffff800012cc50) at /usr/src/sys/kern/kern_fork.c:876
#21 0xffffffff8061b51e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:602
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000001 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0x0000000000000000 in ?? ()
#46 0x0000000000000003 in ?? ()
#47 0xffffffff808b81c0 in affinity ()
#48 0xffffffff808b9ac0 in tdq_cpu ()
#49 0xffffff00015c3000 in ?? ()
#50 0xffffff800012c200 in ?? ()
#51 0xffffff800012c1b8 in ?? ()
#52 0xffffff0001769000 in ?? ()
#53 0xffffffff803c5534 in sched_switch (td=0xffffff00017457c0, 
    newtd=0xffffffff8037a320, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1860
Previous frame inner to this frame (corrupt stack?)
(kgdb)

I can give more system info, but am curious if anyone has seen this type of panic.
 
It is? I can try dumping ipnat for NAT in ipfw. I know ipfw quite well, but have never used the NAT functionality in it.

The box reboots anywhere from 10 minutes to 2 hours, crazy as it sounds, it feels like a rogue packet is causing it to reboot.

ipnat -s
Code:
mapped  in      224864  out     161457
added   7630    expired 6369
no memory       0       bad nat 70
inuse   1261
orphans 0
rules   4
wilds   0
hash efficiency 72.09%
bucket usage    44.41%
minimal length  0
maximal length  4
average length  1.387
TCP Entries per state
     0     1     2     3     4     5     6     7     8     9    10    11
     1     2     7     0   157    71    14     0     0     0   243    12

And I tune up the number of ipnat sessions at boot with this:
Code:
ipfilter_flags="-D -T ipf_nattable_sz=4095,ipf_nattable_max=100000,fr_tcptimeout=180,fr_tcpclosewait=60,fr_tcphalfclosed=7200,fr_tcpidletimeout=172800,fr_udptimeout=12 -E"

Number of sessions jumps to over 10k during the day, so the default 30k felt too low.
 
ipfw2 +NAT rules!

aa said:
At least you know that pf() is one to blame :)

I switched from ipnat to ipfw and it is looking good!

I found the man page and info online for ipfw + NAT not that useful. There is some old info on natd (which is a userland solution via divert -- doesn't scale). Here is my config using ipfw and (commented out) the ipnat equivalent.

Code:
# /etc/rc.firewall
# Example on setting up NAT using ipfw_nat.ko in FreeBSD
# April 20, 2012 - MonkeyBrains.net

fwcmd="/sbin/ipfw -q"

${fwcmd} -f flush
${fwcmd} add 100 pass all from any to any via lo0
${fwcmd} add 200 deny all from any to 127.0.0.0/8
${fwcmd} add 300 deny ip from 127.0.0.0/8 to any
${fwcmd} add allow tcp from any to me 2119 // secret SSH port
${fwcmd} add allow tcp from me 2119 to any
${fwcmd} add allow udp from 1.2.30.0/24 to me 161 // SNMP
${fwcmd} add allow udp from me 161 to 1.2.30.0/24

${fwcmd} add allow ip from any to any via em1 // internal interface


# one to one mappings
${fwcmd} nat 211 config redirect_addr  10.50.1.211  8.8.8.211
${fwcmd} nat 212 config redirect_addr  10.50.1.212  8.8.8.212

${fwcmd} add nat 211 all from  10.50.1.211 to any out via em0
${fwcmd} add nat 211 all from any to 8.8.8.211 in via em0
${fwcmd} add nat 212 all from  10.50.1.212 to any out via em0
${fwcmd} add nat 212 all from any to 8.8.8.212  in via em0

# nat everything left over in the one to many mapping
${fwcmd} nat 1 config ip 8.8.8.210 same_ports reset deny_in log unreg_only
${fwcmd} add 10000 nat 1 all from any to any via em0

# just to be safe... (accounting here should be zero after 'ipfw zero')
${fwcmd} add 60000 allow ip from any to any


## ipnat ##   ### global mapping
## ipnat ##   map em0 10.100.0.0/24 -> 8.8.8.210
## ipnat ##   ### one to one
## ipnat ##   bimap em0 10.50.1.211/32 -> 8.8.8.211
## ipnat ##   bimap em0 10.50.1.212/32 -> 8.8.8.212

My only question: how do I see all active mappings?
[CMD=""] ipfw nat 1 show
[/CMD]
is not that useful... it only returns a summary, eg:
Code:
nat 1: icmp=0, udp=540, tcp=1961, sctp=0, pptp=0, proto=0, frag_id=0 frag_ptr=0 / tot=2501

When a computer gets hacked, I'd like to see which end user has 1000 active sessions so I could track that box down...
 
Back
Top