mpd pppoe server and kernel panic!

Hello! it is about 2 month I am trying to setup PPPOE server with more than 7000 conected users, everything is working! But I get kernel panics :( I tried many hardware! There is no SHAPING or NAT! Just FreeBSD 9.0, MPD 5.6, and routing for internet access! When I use polling it panics faster! About 10 min uptime, and when I remove polling about one hour and panic again :(

Hardware I used:
IBM SERVER, with 2 XEON Cpu 2.0 Mhz, 8 GB DDR2 RAM, 2 intel (em) Interfaces
cpu usage when crashed in one core is about 40% idle (all used by system) and all other cores is about 70% or 80% idle
6400 connected user, 400 MB BW, and more than 100 Vlan

Here one is of the panics
Code:
PPPOESERVER dumped core - see /var/crash/vmcore.0

Wed Aug 29 15:16:52 IRDT 2012

FreeBSD PPPOESERVER 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Wed Aug 29 13:56:46 IRDT 2012     root@pppoe_server:/usr/obj/usr/src/sys/GENERIC  amd64

panic: page fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x5b600000099
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff8093738c
stack pointer	        = 0x28:0xffffff824d18c6d0
frame pointer	        = 0x28:0xffffff824d18c700
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (em0 taskq)
trap number		= 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8086b1de at kdb_backtrace+0x5e
#1 0xffffffff80835d97 at panic+0x187
#2 0xffffffff80b40ec0 at trap_fatal+0x290
#3 0xffffffff80b41209 at trap_pfault+0x1f9
#4 0xffffffff80b416cf at trap+0x3df
#5 0xffffffff80b2bbff at calltrap+0x8
#6 0xffffffff8093d8d8 at ng_pppoe_rcvdata_ether+0x2b8
#7 0xffffffff80938fdb at ng_apply_item+0x22b
#8 0xffffffff80937f8e at ng_snd_item+0x39e
#9 0xffffffff808ea897 at ether_demux+0x127
#10 0xffffffff808eab94 at ether_nh_input+0x1f4
#11 0xffffffff808f4e8b at netisr_dispatch_src+0x20b
#12 0xffffffff808ea7df at ether_demux+0x6f
#13 0xffffffff808eab94 at ether_nh_input+0x1f4
#14 0xffffffff808f4e8b at netisr_dispatch_src+0x20b
#15 0xffffffff8047303a at em_rxeof+0x1ca
#16 0xffffffff804734bb at em_handle_que+0x5b
#17 0xffffffff808774e5 at taskqueue_run_locked+0x85
Uptime: 23m37s
Dumping 662 out of 8173 MB:..3%..13%..22%..32%..42%..51%..61%..71%..83%..92%

Reading symbols from /boot/kernel/ng_socket.ko...Reading symbols from /boot/kernel/ng_socket.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_socket.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from /boot/kernel/ng_mppc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from /boot/kernel/rc4.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/rc4.ko
Reading symbols from /boot/kernel/ng_ether.ko...Reading symbols from /boot/kernel/ng_ether.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_ether.ko
Reading symbols from /boot/kernel/ng_tee.ko...Reading symbols from /boot/kernel/ng_tee.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_tee.ko
Reading symbols from /boot/kernel/ng_iface.ko...Reading symbols from /boot/kernel/ng_iface.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_iface.ko
Reading symbols from /boot/kernel/ng_ppp.ko...Reading symbols from /boot/kernel/ng_ppp.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_ppp.ko
Reading symbols from /boot/kernel/ng_vjc.ko...Reading symbols from /boot/kernel/ng_vjc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_vjc.ko
#0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
224	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
#1  0xffffffff808358d5 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:442
#2  0xffffffff80835d81 in panic (fmt=Variable "fmt" is not available.
)
    at /usr/src/sys/kern/kern_shutdown.c:607
#3  0xffffffff80b40ec0 in trap_fatal (frame=0xc, eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:819
#4  0xffffffff80b41209 in trap_pfault (frame=0xffffff824d18c620, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:735
#5  0xffffffff80b416cf in trap (frame=0xffffff824d18c620)
    at /usr/src/sys/amd64/amd64/trap.c:474
#6  0xffffffff80b2bbff in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:228
#7  0xffffffff8093738c in ng_address_hook (here=0x0, item=0xfffffe0199acab00, 
    hook=0xfffffe01b7a5ac00, retaddr=0)
    at /usr/src/sys/netgraph/ng_base.c:3487
#8  0xffffffff8093d8d8 in ng_pppoe_rcvdata_ether (hook=Variable "hook" is not available.
)
    at /usr/src/sys/netgraph/ng_pppoe.c:1655
#9  0xffffffff80938fdb in ng_apply_item (node=0xfffffe0024cb9900, 
    item=0xfffffe0199acab00, rw=0) at /usr/src/sys/netgraph/ng_base.c:2318
#10 0xffffffff80937f8e in ng_snd_item (item=Variable "item" is not available.
)
    at /usr/src/sys/netgraph/ng_base.c:2235
#11 0xffffffff808ea897 in ether_demux (ifp=0xfffffe002400b800, m=Variable "m" is not available.
)
    at /usr/src/sys/net/if_ethersubr.c:954
#12 0xffffffff808eab94 in ether_nh_input (m=Variable "m" is not available.
)
    at /usr/src/sys/net/if_ethersubr.c:756
#13 0xffffffff808f4e8b in netisr_dispatch_src (proto=9, source=Variable "source" is not available.
)
    at /usr/src/sys/net/netisr.c:1013
#14 0xffffffff808ea7df in ether_demux (ifp=0xfffffe00054aa000, 
    m=0xfffffe0199e12600) at /usr/src/sys/net/if_ethersubr.c:846
#15 0xffffffff808eab94 in ether_nh_input (m=Variable "m" is not available.
)
    at /usr/src/sys/net/if_ethersubr.c:756
#16 0xffffffff808f4e8b in netisr_dispatch_src (proto=9, source=Variable "source" is not available.
)
    at /usr/src/sys/net/netisr.c:1013
#17 0xffffffff8047303a in em_rxeof (rxr=0xfffffe0009044400, count=97, 
    done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4340
#18 0xffffffff804734bb in em_handle_que (context=Variable "context" is not available.
)
    at /usr/src/sys/dev/e1000/if_em.c:1518
#19 0xffffffff808774e5 in taskqueue_run_locked (queue=0xfffffe0009042200)
    at /usr/src/sys/kern/subr_taskqueue.c:308
#20 0xffffffff80878466 in taskqueue_thread_loop (arg=Variable "arg" is not available.
)
    at /usr/src/sys/kern/subr_taskqueue.c:497
#21 0xffffffff8080990f in fork_exit (
    callout=0xffffffff80878420 <taskqueue_thread_loop>, 
    arg=0xffffff800230c748, frame=0xffffff824d18cc50)
    at /usr/src/sys/kern/kern_fork.c:995
#22 0xffffffff80b2c12e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:602
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0x0000000000000000 in ?? ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0x0000000000000000 in ?? ()
#44 0x0000000000000000 in ?? ()
#45 0x0000000000000000 in ?? ()
#46 0x0000000000000000 in ?? ()
#47 0xffffffff8117ed78 in sleepq_chains ()
#48 0xfffffe0005472428 in ?? ()
#49 0x0000000000000000 in ?? ()
#50 0xfffffe0005472000 in ?? ()
#51 0xffffff824d18cb00 in ?? ()
#52 0xffffff824d18caa8 in ?? ()
#53 0xfffffe00052ab460 in ?? ()
#54 0xffffffff8085dda2 in sched_switch (td=0xffffffff80878420, 
    newtd=0xffffff800230c748, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1848
Previous frame inner to this frame (corrupt stack?)
(kgdb)

Tunings:
Code:
kern.ipc.maxpipekva=536870912
net.inet.tcp.delayed_ack=0 
net.isr.maxthreads=7 
net.isr.direct=1 
net.isr.direct_force=1 
net.isr.bindthreads=0 
kern.maxfiles=204800 
kern.maxfilesperproc=200000 
net.graph.maxalloc=4096 
kern.maxusers=2048 
kern.ipc.maxsockbuf=16777216 
kern.ipc.nmbclusters=262144 
kern.ipc.somaxconn=32768 
kern.ipc.maxsockets=204800
Where is the problem? Or what am I doing wrong? :(
 
Assalam o Alaikum

Dear s_265_925, I am new to mpd pppoe. Can you please let me know how to I configure mpd server and client and how to make connection between them. I will be very thankful to you
 
s_265_925,

can you please update to stable/9?

There were at least one race condition related bugfix in the netgraph. Also there were an optimisation in netgraph especially for a case when mpd has zillions of users.

If panics are still here in stable/9, then please post backtrace from panic on stable/9.
 
glebius@,
I am using FreeBSD 9.0 stable downloaded 2 month ago,and i get punic's
so i should send u backtrace?
is it possible i get punic's because of using old Network (Intel 82572GI) Card and old Hardware ?
also i am using more than 100-150 Vlans in one box for my pppoe server!
 
Line numbers in your backtrace show that your system is older then 14 April 2012, so it isn't 2 months old.

If you are talking about getting panics on another one, which is 2 months old, then please post backtrace from its panic.
 
In the file it is said that system is 9.0-RELEASE. I'm asking you to update to STABLE, since there was at least one bugfix, and if problem doesn't vanish provide backtrace from updated system.
 
Today I have kernel panic on my FreeBSD 9 amd64.
Code:
uname -a
FreeBSD rubin 9.1-PRERELEASE FreeBSD 9.1-PRERELEASE #1: Wed Sep  5 13:32:27 YEKT 2012     root@rubin:/usr/obj/usr/src/sys/ROUTER  amd64

Code:
Oct 14 23:46:43 rubin kernel: Fatal trap 12: page fault while in kernel mode
Oct 14 23:46:43 rubin kernel: cpuid = 2; apic id = 04
Oct 14 23:46:43 rubin kernel: fault virtual address >= 0xfa5d
Oct 14 23:46:43 rubin kernel: fault code= supervisor read data, page not present
Oct 14 23:46:43 rubin kernel: instruction pointer   = 0x20:0xffffffff80ac3f71
Oct 14 23:46:43 rubin kernel: stack pointer>        = 0x28:0xffffff81ded3a1d0
Oct 14 23:46:43 rubin kernel: frame pointer    = 0x28:0xffffff81ded3a200
Oct 14 23:46:43 rubin kernel: code segment= base 0x0, limit 0xfffff, type 0x1b
Oct 14 23:46:43 rubin kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
I using mpd, ng_nat, ipfw on this machine.

For some reason I have not dump (may be swap on mirror do that).

Any help? How prevent this?
 
Once more today:

Code:
Oct 17 21:36:50 rubin kernel:-                                                                                   
Oct 17 21:36:50 rubin kernel:-                                                                                    Oct 17 21:36:50 rubin kernel: Fatal trap 12: page fault while in kernel mode                                     
Oct 17 21:36:50 rubin kernel: cpuid = 2; apic id = 04                                                            
Oct 17 21:36:50 rubin kernel: fault virtual address>= 0xfa5d                                                      Oct 17 21:36:50 rubin kernel: fault code>--->---= supervisor read data, page not present                         
Oct 17 21:36:50 rubin kernel: instruction pointer>--= 0x20:0xffffffff80ac3f71                                     Oct 17 21:36:50 rubin kernel: stack pointer>        = 0x28:0xffffff81ded3a1d0                                    
Oct 17 21:36:50 rubin kernel: frame pointer>        = 0x28:0xffffff81ded3a200                                     Oct 17 21:36:50 rubin kernel: code segment>->---= base 0x0, limit 0xfffff, type 0x1b                             
Oct 17 21:36:50 rubin kernel: = DPL 0, pres 1, long 1, def32 0, gran 1                                            Oct 17 21:54:21 rubin syslogd: kernel boot file is /boot/kernel/kernel                                           
Oct 17 21:54:21 rubin kernel: :1e:67:48:94:45                                                                    
Oct 17 21:54:21 rubin kernel: Bump WF2Q+ weight to 1 (was 0)                                                      Oct 17 21:54:21 rubin kernel: Bump flowset buckets to 1024 (was 0)
 
Last edited by a moderator:
I have:
Code:
options     INCLUDE_CONFIG_FILE     # Include this file in kernel                                                
options     KDB         # Kernel debugger related code                                                           
options     KDB_TRACE       # Print a stack trace for a panic
in my kernel config.

Today I switch back to another server, and trying to upgrade this to current 9 stable, and insert some flash card for swap/core.
 
Last edited by a moderator:
Hi!

I have the exactly same problem with FreeBSD 9.2 on different hardware too... HP DL 360 G5 and HP Microserver G7. There's a link between the amount of customer and/or traffic and the rate of panic over time... It happens every 2 or 3 weeks for 1000 customers. Sometimes it may happen multiple times in the same week...

OS: FreeBSD 9.2-RELEASE
Software: MPD, IPFW (traffic shaping), PF (firewall).
 
Back
Top