kernel panic after upgrade from 9.2 to 10.3 release

Hello everyone.
I encountered a problem after upgrading server from FreeBSD 9.2 Release to 10.3-RELEASE-p15. A kernel panic happens when network traffic are increasing. For example, I use rsync to deliver backups from the server to local machine. After a while, I get panic that states:
Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x59
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80dae53c
stack pointer           = 0x28:0xfffffe04e7117570
frame pointer           = 0x28:0xfffffe04e7117580
code segment            = base 0x0, limit 0xfffff, type 0x1b
                   = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq256: re0)
trap number             = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff809dc210 at kdb_backtrace+0x60
#1 0xffffffff8099eee6 at vpanic+0x126
#2 0xffffffff8099edb3 at panic+0x43
#3 0xffffffff80db091b at trap_fatal+0x36b
#4 0xffffffff80db0c1d at trap_pfault+0x2ed
#5 0xffffffff80db029a at trap+0x47a
#6 0xffffffff80d96262 at calltrap+0x8
#7 0xffffffff80369771 at ipf_frag_lookup+0x111
#8 0xffffffff803699e4 at ipf_frag_known+0x54
#9 0xffffffff8035ad3d at ipf_check+0x2fd
#10 0xffffffff80a72dd4 at pfil_run_hooks+0x84
#11 0xffffffff80adf3ae at ip_input+0x2fe
#12 0xffffffff80a71f12 at netisr_dispatch_src+0x62
#13 0xffffffff80a692d6 at ether_demux+0x126
#14 0xffffffff80a69f7e at ether_nh_input+0x35e
#15 0xffffffff80a71f12 at netisr_dispatch_src+0x62
#16 0xffffffff807144ee at re_rxeof+0x4ce
#17 0xffffffff8071572b at re_intr_msi+0x10b

More detailed trace:
Code:
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff8099eb42 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff8099ef25 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff8099edb3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff80db091b in trap_fatal (frame=<value optimized out>, eva=<value optimized out>)
at /usr/src/sys/amd64/amd64/trap.c:858
#5  0xffffffff80db0c1d in trap_pfault (frame=0xfffffe04e71174c0, usermode=<value optimized out>)
at /usr/src/sys/amd64/amd64/trap.c:681
#6  0xffffffff80db029a in trap (frame=0xfffffe04e71174c0) at /usr/src/sys/amd64/amd64/trap.c:447
#7  0xffffffff80d96262 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff80dae53c in bcmp () at /usr/src/sys/amd64/amd64/support.S:87
#9  0xffffffff80369771 in ipf_frag_lookup () at /usr/src/sys/contrib/ipfilter/netinet/ip_frag.c:697
#10 0xffffffff803699e4 in ipf_frag_known (fin=0xfffffe04e71176d8, passp=0xfffffe04e71176d4)
at /usr/src/sys/contrib/ipfilter/netinet/ip_frag.c:895
#11 0xffffffff8035ad3d in ipf_check (ctx=0xffffffff81e85828, ip=<value optimized out>,
hlen=<value optimized out>, ifp=<value optimized out>, out=0, mp=0xfffffe04e7117838)
at /usr/src/sys/contrib/ipfilter/netinet/fil.c:3025
#12 0xffffffff80a72dd4 in pfil_run_hooks (ph=0xffffffff81e9f918, mp=0xfffffe04e71178c0, ifp=0xfffff80005a36000,
dir=1, inp=0x0) at /usr/src/sys/net/pfil.c:82
#13 0xffffffff80adf3ae in ip_input (m=0xfffff8006c59a500) at /usr/src/sys/netinet/ip_input.c:488
#14 0xffffffff80a71f12 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0x1)
at /usr/src/sys/net/netisr.c:976
#15 0xffffffff80a692d6 in ether_demux (ifp=<value optimized out>, m=0xfffff8006c59a500)
at /usr/src/sys/net/if_ethersubr.c:851
#16 0xffffffff80a69f7e in ether_nh_input (m=<value optimized out>) at /usr/src/sys/net/if_ethersubr.c:646
#17 0xffffffff80a71f12 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0x1)
at /usr/src/sys/net/netisr.c:976
#18 0xffffffff807144ee in re_rxeof (sc=0xfffffe0000b98000, rx_npktsp=0x0) at /usr/src/sys/dev/re/if_re.c:2369
#19 0xffffffff8071572b in re_intr_msi (xsc=0xfffffe0000b98000) at /usr/src/sys/dev/re/if_re.c:2665
#20 0xffffffff80969b2b in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff80005a64e00)
at /usr/src/sys/kern/kern_intr.c:1264
#21 0xffffffff80969f76 in ithread_loop (arg=0xfffff80005a072a0) at /usr/src/sys/kern/kern_intr.c:1277
#22 0xffffffff8096767a in fork_exit (callout=0xffffffff80969ee0 <ithread_loop>, arg=0xfffff80005a072a0,
frame=0xfffffe04e7117c00) at /usr/src/sys/kern/kern_fork.c:1027
  #23 0xffffffff80d9679e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611
#24 0x0000000000000000 in ?? ()

I suppose the problem is in my network card driver:
Code:
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xe800-0xe8ff mem 0xfbeff000-0xfbefffff,0xf6ff0000-0xf6ffffff irq 16 at device 0.0 on pci6

or somewhere on network stack. Should I try to recompile world to 10 stable ? Or try to disable all sysctl.conf settings concerning network, like
Code:
net.inet.ip.rtminexpire=2
net.inet.ip.rtmaxcache=1024
net.inet.tcp.sack.enable=1
net.inet.tcp.finwait2_timeout=10000
net.inet.ip.intr_queue_maxlen=4096
kern.ipc.somaxconn=32768
net.inet.tcp.maxtcptw=200000

Thanks for answer!
 
It looks like it's IPFilter that crashes the machine. How did you do the upgrade?
 
It looks like it's IPFilter that crashes the machine. How did you do the upgrade?
I svn'ed code from releng/10 branch, make world/make build kernel/make install kernel/make installworld -> reboot -> recompile all ports.
I build custom kernel with options:
Code:
# BDS 20121212
options         IPFILTER                # support IPFILTER
options         IPFILTER_LOG
options         IPFILTER_DEFAULT_BLOCK

# BDS 20121213
options         SC_HISTORY_SIZE=8192    # чтобы в консоли можно было далеко листать историю
options         ACCEPT_FILTER_DATA        # фильтры для nginx
options         ACCEPT_FILTER_HTTP        # ...
options         HZ=1000
options         DEVICE_POLLING
 
Look like closely related to PR 212872
in the above post: "...In my case, the error caused by garbage traffic IpV6"
In my case, when I run rsync from local machine that is downloading backuped files from the server, after ~60 sec I get kernel panic... No ipv6 traffic.
 
Is it possible to disable IPFilter temporarily? Only to rule it out as a possible cause.
 
in the above post: "...In my case, the error caused by garbage traffic IpV6"
I would read that as "In my case, the error is triggered by garbage IPv6 traffic" ... just like yours seems to be triggered from rsync traffic ...

Note also that while the bug report was filled for a system running ipfw.ko, the second poster was instead running pf.ko, and they both are using an "igp" device. Either those are separate issues or the problem is a bit deeper.

Both your case and the initial bug report experience the error from :
Code:
ipf_frag_lookup () at /usr/src/sys/contrib/ipfilter/netinet/ip_frag.c:697
in a function that does:
Code:
/* Check the fragment cache to see if there is already a record of this     */
/* packet with its filter result known.                                     */

As I read it something is going wrong while trying to access that "cache", and I suspect that may happen independently from specific filtering in use, ...

I would suggest to add your info to that PR.
 
Back
Top