MPD crashes FreeBSD 7.2 and 8.1 after update to 5.6 version and patch

Hello everybody!

I have a few servers with freebsd FreeBSD 7.2 or 8.1 and MPD 5.5 for a PPPoE connection. After I updated MPD to version 5.6 (I use ports and this patch and a patch to support the CoA RAD_CLASS attribute:
Code:
--- ../mpd-5.6/src/radsrv.c     2011-12-21 23:58:49.000000000 +0900
+++ ./src/radsrv.c      2012-04-02 19:02:26.106800017 +0900
@@ -94,6 +94,7 @@
     Bund       B;
     Link       L;
     char        *tmpval;
+       u_char  *rad_class = NULL;
     char       *username = NULL, *called = NULL, *calling = NULL, *sesid = NULL;
     char       *msesid = NULL, *link = NULL, *bundle = NULL, *iface = NULL;
     int                nasport = -1, serv_type = 0, ifindex = -1, i;
@@ -163,6 +164,13 @@
                Log(LG_RADIUS2, ("radsrv: Got RAD_USER_NAME: %s",
                    username));
                break;
+               case RAD_CLASS:
+               tmpval = Bin2Hex(data, len);
+               Log(LG_RADIUS2, ("radsrv: Got RAD_CLASS: %s",
+                       tmpval));
+               Freee(tmpval);
+               rad_class = Mdup(MB_AUTH, data, len);
+               break;
            case RAD_NAS_IP_ADDRESS:
                nas_ip = rad_cvt_addr(data);
                Log(LG_RADIUS2, ("radsrv: Got RAD_NAS_IP_ADDRESS: %s ",
@@ -509,6 +517,8 @@
                ACLCopy(acl_queue, &L->lcp.auth.params.acl_queue);
                ACLCopy(acl_table, &L->lcp.auth.params.acl_table);
 #endif /* USE_IPFW */
+               if (rad_class)
+                       L->lcp.auth.params.class=rad_class;
 #ifdef USE_NG_BPF
                for (i = 0; i < ACL_FILTERS; i++) {
                    ACLDestroy(L->lcp.auth.params.acl_filters[i]);

After this update the servers start to reboot after a panic periodically about once a week. The reasons are different but usually it is:
Code:
kgdb /boot/kernel/kernel /var/crash/vmcore.2 
...
Fatal trap 18: integer divide fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer	= 0x20:0xc4de1d73
stack pointer	        = 0x28:0xc3f92670
frame pointer	        = 0x28:0xc3f926c0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 26 (em1 taskq)
trap number		= 18
...

(kgdb) list *0xc4de1d73
0xc4de1d73 is in bpf_filter (/usr/src/sys/modules/netgraph/bpf/../../../net/bpf_filter.c:461).
456			case BPF_ALU|BPF_MUL|BPF_K:
457				A *= pc->k;
458				continue;
459	
460			case BPF_ALU|BPF_DIV|BPF_K:
461				A /= pc->k;
462				continue;
463	
464			case BPF_ALU|BPF_AND|BPF_K:
465				A &= pc->k;

Sometimes there are other errors, but there is always bpf_filter in "where" command output of gdb. All my kernels have additional options:
Code:
options         IPFIREWALL
options         IPDIVERT
options         IPFIREWALL_FORWARD
options         NETGRAPH
options         NETGRAPH_IPFW
options         NETGRAPH_PPPOE
options         NETGRAPH_IFACE
options         DEVICE_POLLING
options         HZ=1000
And I've changed these sysctl variables:
Code:
net.inet.icmp.icmplim=800
net.inet.flowtable.enable=0
net.isr.direct=1
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0
net.inet.ip.fastforwarding=1
vm.pmap.shpgperproc=2048
net.isr.maxthreads 2
net.isr.bindthreads 1

There are about 200 users on every server. And pppoe-delay=3 or 4 (see this patch).

What may be the reason of kernel panic?
 
This is happening only with the patch applied? I see a 'Mdup', but no 'Freee' for rad_class, this may eat memory, especially in the long run.
 
ecazamir said:
This is happening only with the patch applied? I see a 'Mdup', but no 'Freee' for rad_class, this may eat memory, especially in the long run.

Thank you for the answer! I can't right now say if it happens only with this patch, because it happens once in a week or more. I run show mem command in MPD console, here is its output:
Code:
[] show mem
   Type                              Count      Total
   ----                              -----      -----
   AUTH                              32636     872735
   BUND                                150    1103680
   CMD                                   4         46
   CMDL                                528       9387
   COMP                                  1         36
   CONSOLE                               4       9936
   CONSOLE.buckets                       1        124
   CONSOLE.gent                          1          8
   CRYPT                                 1         28
   EVENT                               992      63836
   IFACE                              2524      93856
   LINK                                595    1252528
   PHYS                                689     228242
   PHYS.buckets                          3        372
   PHYS.gent                             1          8
   RADIUS                                4         74
   RADSRV                                3         45
   REP                                   1         16
   WEB                                   3        272
   http_server                           1         60
   http_server.server_name               1         15
   http_server.vhosts                    1        168
   http_server.vhosts.buck               1        124
   http_server.vhosts.gent               1          8
   http_servlet_hook                     1         40
   http_virthost                         1         12
   http_virthost.host                    1          1
   paction                               2         64
   typed_mem_stats                       1        928
                                     -----      -----
   Totals                            38152    3636649
There are 150 users on this server right now. I will see if this number is increasing.
 
ecazamir said:
This is happening only with the patch applied? I see a 'Mdup', but no 'Freee' for rad_class, this may eat memory, especially in the long run.

There was a memory leak! But it doesn't look like it is the reason of the kernel panic, because there is more then 1.5G free RAM and emty 2G swap right before crash.

kgdb says, that call_trap with exception is always called from bpf_filter.c and there are always problems with the pc->k variable (for example, pc->k is equal to 0 but pc->code says divide A by pc->k). How can I check where pc->k takes its value from and why it is wrong?
 
Back
Top