mpd 5

I recently upgraded from freebsd FreeBSD 6.2 running pppd to freebsd FreeBSD 8.2 running mpd5.5 (as I can't get pppd to pass traffic). I have about 100 PPPoE clients connected to the mpd5 server.

The problem I have is that my system either locks up, or kernel panics every 3 to 5 days! I have found some patches and applied them to the /src/sys/netgraph/ng_base.c and the /src/sys/net/if.c files. This has not helped at all. I'm wondering if anyone might have any suggestions to fix this problem.

Here is some partial panic information from kgdb:
Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x308
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff805bdf5e
stack pointer           = 0x28:0xffffff800008ea10
frame pointer           = 0x28:0xffffff800008ea30
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi6: task queue)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff805ff90e at kdb_backtrace+0x5e
#1 0xffffffff805cd807 at panic+0x187
#2 0xffffffff808cfcc0 at trap_fatal+0x290
#3 0xffffffff808d009f at trap_pfault+0x28f
#4 0xffffffff808d057f at trap+0x3df
#5 0xffffffff808b8674 at calltrap+0x8
#6 0xffffffff805cc8b0 at _sema_post+0x90
#7 0xffffffff8027f834 at ata_completed+0x474
#8 0xffffffff8060a9b5 at taskqueue_run_locked+0x85
#9 0xffffffff8060ac98 at taskqueue_run+0x38
#10 0xffffffff805a6094 at intr_event_execute_handlers+0x104
#11 0xffffffff805a7745 at ithread_loop+0x95
#12 0xffffffff805a3ff8 at fork_exit+0x118
#13 0xffffffff808b8b3e at fork_trampoline+0xe
Uptime: 3d1h34m39s
Physical memory: 2033 MB
Dumping 474 MB: 459 443 427 411 395 379 363 347 331 315 299 283 267 251 235 219 203 187 171 155 139 123 107 91 75 59 43 27 11
.
.
.
.
Loaded symbols for /boot/kernel/ng_iface.ko
Reading symbols from /boot/kernel/ng_ppp.ko...Reading symbols from /boot/kernel/ng_ppp.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_ppp.ko
#0  doadump () at pcpu.h:224
224             __asm("movq %%gs:0,%0" : "=r" (td));
Any help would be appreciated. Thanks.
 
ecazamir said:
Try using this on kernel configuration file:
Code:
nooptions	FLOWTABLE

I had already commented out the # FLOWTABLE entry. I'm guessing that is not enough? I will add the # nooptions and recompile. I will know if that works within a few days! Thanks.
 
I had similar problems (mpd server repeatedly crashing after 6 days of uptime) and this workaround fixed the problem. The machine I mentioned also runs quagga/zebra/bgpd with a routing table of about 15000 entries, a combination which is known to exhibit this kind of crashes if FLOWTABLE is active.
 
I had read about the flowtable bug, so I commented the entry in my kernel config file. It seems that is not enough to disable it! I have already recompiled, but I don't want to reboot unless I have to! I will wait for the next crash and see how it goes after that. It should only take a day or 2!
 
I hope so. I will know for sure soon. The current uptime since reboot is 4 days, 17 hours. If it's going to crash again, it will be soon. I will keep you posted. Thanks.
 
The server just crashed! uptime 4 days 22:50! Did not get any core dumps as the dump froze while dumping memory. After an hour of waiting, I gave up and hit the reset button!

I expect the crash is identical to what I originally posted. The last crash was identical, so I assume this one is too.

I really need to fix this or just forget about running anything newer than 6.4! It's a shame that the newer OS is so unstable!
 
Yet another crash today. Not even 2 days uptime! Really disappointed in this OS version!

Can anyone shed some light on these crashes?
 
Does anyone have any ideas about this? As I have found by research, this is an extremely common problem with version 7 and 8! Does anyone know if this is also a problem in FreeBSD 9?
 
If FreeBSD 7.x or 8.x does not bring anything you really need, staying with 6.x should be a good idea. But if you already use recent versions, you should try to use a stock kernel compiled with debug symbols and then send a bug report. If you really need a custom kernel, build it with debug symbols.

EDIT: If the problem persists, then sending a problem report may help fixing your problem./EDIT

This kind of problems is not so frequent for me, out of four servers running mpd5 only one had crashes. All four servers were serving more than 50 PPPoE clients. The server that crashed most frequently was the most used, it is serving more than 300 concurent pppoe connections.

I would summarize the info about this server as following:
- it is using FreeBSD version 8.x
- it is using quagga (zebra and bgpd), having more than 10000 routes in table
- it is a multi-core processor machine
- it is serving more than 150 concurrent PPPoE connections
- it has more than 150 mbit/sec network traffic per direction (600 Mbps total)

An important note: other machines, having lower network traffic, didn't experience the crashes, even at 70 concurrent PPPoE clients connected, at less than 300 mbps total (sum of in and out on all directions) bandwidth, even if FLOWTABLE was set at the default value.
 
Can anyone tell me if this problem still exists in FreeBSD 9? I don't want to lose the ability to transition to IPv6 by downgrading to FreeBSD 6.
 
It seems FreeBSD 9 Release has the same problem! The first run was almost 6 days, the second was just over 32 minutes before panic! At least the core dump completed this time!

Code:
RGC_Wireless# cd /usr/obj/usr/src/sys/PPPSERVER
RGC_Wireless# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
acd0: WARNING - READ_CAPACITY taskqueue timeout - completing request directly
acd0: WARNING - PREVENT_ALLOW freeing taskqueue zombie request
acd0: WARNING - TEST_UNIT_READY freeing taskqueue zombie request
acd0: WARNING - READ_TOC freeing taskqueue zombie request
acd0: WARNING - READ_TOC freeing taskqueue zombie request
acd0: WARNING - READ_CAPACITY freeing taskqueue zombie request


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x360
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff806e858e
stack pointer           = 0x28:0xffffff8000289a30
frame pointer           = 0x28:0xffffff8000289a50
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi6: task queue)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8072d7de at kdb_backtrace+0x5e
#1 0xffffffff806f8397 at panic+0x187
#2 0xffffffff80a1fc30 at trap_fatal+0x290
#3 0xffffffff80a1ff79 at trap_pfault+0x1f9
#4 0xffffffff80a2043f at trap+0x3df
#5 0xffffffff80a0a96f at calltrap+0x8
#6 0xffffffff806f74c0 at _sema_post+0x90
#7 0xffffffff8038e134 at ata_completed+0x474
#8 0xffffffff80739ae5 at taskqueue_run_locked+0x85
#9 0xffffffff80739c6a at taskqueue_run+0x3a
#10 0xffffffff806ced24 at intr_event_execute_handlers+0x104
#11 0xffffffff806d04e4 at ithread_loop+0xa4
#12 0xffffffff806cbf0f at fork_exit+0x11f
#13 0xffffffff80a0ae9e at fork_trampoline+0xe
Uptime: 32m26s
Dumping 397 out of 2031 MB:..5%..13%..21%..33%..41%..53%..61%..73%..81%..93%

Reading symbols from /boot/kernel/if_bridge.ko...Reading symbols from /boot/kernel/if_bridge.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/if_bridge.ko
Reading symbols from /boot/kernel/bridgestp.ko...Reading symbols from /boot/kernel/bridgestp.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/bridgestp.ko
Reading symbols from /boot/kernel/nfscl.ko...Reading symbols from /boot/kernel/nfscl.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nfscl.ko
Reading symbols from /boot/kernel/nfscommon.ko...Reading symbols from /boot/kernel/nfscommon.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nfscommon.ko
Reading symbols from /boot/kernel/ng_mppc.ko...Reading symbols from /boot/kernel/ng_mppc.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_mppc.ko
Reading symbols from /boot/kernel/rc4.ko...Reading symbols from /boot/kernel/rc4.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/rc4.ko
Reading symbols from /boot/kernel/ng_ether.ko...Reading symbols from /boot/kernel/ng_ether.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_ether.ko
Reading symbols from /boot/kernel/ng_tee.ko...Reading symbols from /boot/kernel/ng_tee.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_tee.ko
Reading symbols from /boot/kernel/radeon.ko...Reading symbols from /boot/kernel/radeon.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/radeon.ko
Reading symbols from /boot/kernel/drm.ko...Reading symbols from /boot/kernel/drm.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/drm.ko
Reading symbols from /boot/kernel/ng_iface.ko...Reading symbols from /boot/kernel/ng_iface.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_iface.ko
Reading symbols from /boot/kernel/ng_ppp.ko...Reading symbols from /boot/kernel/ng_ppp.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/ng_ppp.ko
#0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
224             __asm("movq %%gs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump (textdump=Variable "textdump" is not available.
) at pcpu.h:224
#1  0xffffffff806f7ed5 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:442
#2  0xffffffff806f8381 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:607
#3  0xffffffff80a1fc30 in trap_fatal (frame=0xc, eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:819
#4  0xffffffff80a1ff79 in trap_pfault (frame=0xffffff8000289980, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:735
#5  0xffffffff80a2043f in trap (frame=0xffffff8000289980)
    at /usr/src/sys/amd64/amd64/trap.c:474
#6  0xffffffff80a0a96f in calltrap () at /usr/src/sys/amd64/amd64/exception.S:228
#7  0xffffffff806e858e in _mtx_lock_sleep (m=0xfffffe00058da7f8, 
    tid=18446741874726511712, opts=Variable "opts" is not available.
) at /usr/src/sys/kern/kern_mutex.c:369
#8  0xffffffff806f74c0 in _sema_post (sema=0xfffffe00058da7f8, file=Variable "file" is not available.
)
    at /usr/src/sys/kern/kern_sema.c:79
#9  0xffffffff8038e134 in ata_completed (context=Variable "context" is not available.
)
    at /usr/src/sys/dev/ata/ata-queue.c:491
#10 0xffffffff80739ae5 in taskqueue_run_locked (queue=0xfffffe000265b900)
    at /usr/src/sys/kern/subr_taskqueue.c:308
#11 0xffffffff80739c6a in taskqueue_run (queue=0xfffffe000265b900)
    at /usr/src/sys/kern/subr_taskqueue.c:322
#12 0xffffffff806ced24 in intr_event_execute_handlers (p=Variable "p" is not available.
)
    at /usr/src/sys/kern/kern_intr.c:1257
#13 0xffffffff806d04e4 in ithread_loop (arg=0xfffffe00024b0160)
    at /usr/src/sys/kern/kern_intr.c:1270
#14 0xffffffff806cbf0f in fork_exit (callout=0xffffffff806d0440 <ithread_loop>, 
    arg=0xfffffe00024b0160, frame=0xffffff8000289c50)
    at /usr/src/sys/kern/kern_fork.c:995
#15 0xffffffff80a0ae9e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:602
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000001 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
#34 0x0000000000000000 in ?? ()
#35 0x0000000000000000 in ?? ()
#36 0x0000000000000000 in ?? ()
#37 0x0000000000000000 in ?? ()
#38 0x0000000000000000 in ?? ()
#39 0x0000000000000000 in ?? ()
#40 0xffffffff80f11240 in affinity ()
#41 0x0000000000000000 in ?? ()
#42 0x0000000000000000 in ?? ()
#43 0xfffffe000265a460 in ?? ()
#44 0xffffff80002890f0 in ?? ()
#45 0xffffff8000289098 in ?? ()
#46 0xfffffe00024c78c0 in ?? ()
#47 0xffffffff807203a2 in sched_switch (td=0xffffffff806d0440, 
    newtd=0xfffffe00024b0160, flags=Variable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1848
Previous frame inner to this frame (corrupt stack?)
(kgdb)
 
I'm not familiar with debugging tools like gdb, but I see a reference to sched_ule. ULE is the default scheduler since RELEASE-7.1. You said before that your problems began after upgrading from 6-RELEASE to 7- or later. 4BSD scheduler can be enabled by putting
Code:
options SCHED_4BSD
in your kernel configuration file and removing
Code:
options SCHED_ULE
and recompiling the kernel.
 
Thanks, I will try that.

It's odd that the FreeBSD folks don't seem to have any interest in these crashes. There is obviously a problem here. Possibly one that could be exploited if it involves the networking framework.
 
I'm experiencing the same problems with freebsd FreeBSD 9. I tried to reproduce the problem in a controlled environment unsuccessfully, with one PPPoE connection downloading files for up to 150 mb/s. I can only get the crash in a production environment, when it reaches 100mb/s, doesn't matter the amount of clients, sometimes with 200 and other times with 600.

Anybody has any possible approach to the solution? Now I'm updating my source tree to:
Code:
tag=RELENG_9 date=2012.03.04.00.00.00

with this kernel

Code:
diff NETLABS_KERNEL /usr/src/sys/amd64/conf/GENERIC
26,34c26
< options       DEVICE_POLLING
< options         DUMMYNET
< options         IPFIREWALL
< options         IPFIREWALL_FORWARD
< options         IPFIREWALL_DEFAULT_TO_ACCEPT
< options         IPDIVERT
< nooptions     FLOWTABLE
<
< options       SCHED_4BSD              # ULE scheduler
---
> options       SCHED_ULE               # ULE scheduler

I'm plannig to get this kernel in production this week, but if someone has any advice, I would like to add it in this step.
 
crashed again

Even faster, after 40 minutes running.

users: 171
2 mb/s UP | 13 mb/s DOWN

Code:
) at pcpu.h:224
224             __asm("movq %%gs:0,%0" : "=r" (td));
Is there some more information I can provide in order to throw me some light in this problem?
 
I switched to PPPoED and the crashes cleared up. It seems like a serious problem with netgraph. I think the problem is being largely ignored, as the trouble report I created never got any attention.
 
I've seen netgraph + mpd5 working much better than pppoed + ppp. On my setup, the latter had problems with stale connections requiring manual intervention to clean-up. I switched to mpd5 5 years ago and never looked back.

A sample from a machine, at a light traffic hour:
Code:
# uname -a
FreeBSD machine.local 8.2-RELEASE-p2 FreeBSD 8.2-RELEASE-p2 #3: Sat Jun 11 14:14:14 EEST 2011     
root@machine.local:/usr/obj/usr/src/sys/CUSTOM  i386
# uptime
 9:49AM  up 40 days, 22:16, 1 user, load averages: 1.01, 1.04, 1.01
# ifconfig | grep ^ng | wc -l
      93
During busy hours, more than 150 concurrent connections are handled very well on this machine, more than 250 on others.
 
mpd5.6

Hi all, I have the server HP Pavilion DV7. FreeBSD 8.2 with mpd 5.6. Kernel compiled with devices pf, pflog and pfsync. For the moment I have just 55 users. When a problem occurs with mpd5, I can see all ng interfaces from ifconfig.
But I cannot kill the process for mpd5.

[cmd=]kill -9 `ps -ax | grep mpd5 | grep -v grep | awk '{print $1}'`[/cmd] - is not working.

It is very strange and sometimes server is giving crash dumps with panic.

Please help me.
Thanks.
 
Back
Top