Hello.
I am running FreeBSD 10.0-p9 amd64. I have regular kernel panics usually at 1-3 day intervals. I am running a custom GENERIC kernel (the only difference is that I have added ALTQ options that can be found in the Handbook (30.3.2 Enabling ALTQ))
The server is a DELL PowerEdge R610 (Hyper-Threading disabled) that acts as a router and firewall (pf). It's second-handed hardware so can't exactly tell if it's been reliable in the past, however the BIOS, NIC firmwares, etc. are up to date as we speak.
Obviously I am looking for a solution to remedy it.
Here is an example of a dump that I get every time:
As it can be seen there's some odd NIC (bce2) behaviour reported
bce2: Gigabit link up!
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
...
which is followed by a crash. The IP points to the ULE Scheduler, though.
The interesting thing that in all of my crash dumps it's only bce2 that reports the discard frame... message. Eventhough I have e.g bce0 NIC (same model) that processess just as much traffic at the same time.
The NIC is: bce2: <Broadcom NetXtreme II BCM5709 1000Base-T (C0)> mem 0xda000000-0xdbffffff irq 32 at device 0.0 on pci2
I've found a somewhat similar bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=168217
However it mentions a different NIC model BCM5717 and it has some more messages in the logs. In my case there's nothing but the discard frame w/o... followed by a crash. So I am not sure if it's related
Here are some of my settings:
/etc/sysctl.conf
/boot/loader.conf
Would appreciate any suggestions. Thank you.
EDIT: Solved, the culprit was a bad cable. Replaced it with a better CAT5 one.
I am running FreeBSD 10.0-p9 amd64. I have regular kernel panics usually at 1-3 day intervals. I am running a custom GENERIC kernel (the only difference is that I have added ALTQ options that can be found in the Handbook (30.3.2 Enabling ALTQ))
The server is a DELL PowerEdge R610 (Hyper-Threading disabled) that acts as a router and firewall (pf). It's second-handed hardware so can't exactly tell if it's been reliable in the past, however the BIOS, NIC firmwares, etc. are up to date as we speak.
Obviously I am looking for a solution to remedy it.
Here is an example of a dump that I get every time:
Code:
panic: page fault
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Unread portion of the kernel message buffer:
<5>bce2: link state changed to UP
bce2: Gigabit link up!
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
kernel trap 12 with interrupts disabled
kernel trap 12 with interrupts disabled
kernel trap 12 with interrupts disabled
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
Fatal trap 12: page fault while in kernel mode
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 32
cpuid = 7; apic id = 14
cpuid = 4; apic id = 00
cpuid = 0; apic id = 20
fault virtual address = 0x1
fault virtual address = 0x1
fault code = supervisor read data, page not present
fault virtual address = 0x744af1c
instruction pointer = 0x20:0xffffffff808e0012
fault code = supervisor write data, page not present
fault virtual address = 0x1
stack pointer = 0x28:0xfffffe066487f4c0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff808dfffa
frame pointer = 0x28:0xfffffe066487f510
stack pointer = 0x28:0xfffffe06648acab0
instruction pointer = 0x20:0xffffffff808e0012
fault code = supervisor read data, page not present
stack pointer = 0x28:0xfffffe06638dd570
frame pointer = 0x28:0xfffffe06648acae0
frame pointer = 0x28:0xfffffe06638dd5c0
instruction pointer = 0x20:0xffffffff808e0012
code segment = base 0x0, limit 0xfffff, type 0x1b
stack pointer = 0x28:0xfffffe064b1938e0
= DPL 0, pres 1, long 1, def32 0, gran 1
frame pointer = 0x28:0xfffffe064b193930
code segment = base 0x0, limit 0xfffff, type 0x1b
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = = DPL 0, pres 1, long 1, def32 0 , gran 1
processor eflags = code segment = base 0x0, limit 0xfffff, type 0x1b
resume, IOPL = 0
= DPL 0, pres 1, long 1, def32 0, gran 1
current process = 919 (pflogd)
resume, IOPL = 0
processor eflags = processor eflags = trap number = 12
resume, IOPL = 0
panic: page fault
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff808f0240 at kdb_backtrace+0x60
#1 0xffffffff808b7d25 at panic+0x155
#2 0xffffffff80c96cc2 at trap_fatal+0x3a2
#3 0xffffffff80c96f99 at trap_pfault+0x2c9
#4 0xffffffff80c96726 at trap+0x5e6
#5 0xffffffff80c7d9c2 at calltrap+0x8
#6 0xffffffff80d6e067 at handleevents+0xf7
#7 0xffffffff80d6ea18 at timercb+0x308
#8 0xffffffff80d990bc at lapic_handle_timer+0x9c
#9 0xffffffff80c7e51c at Xtimerint+0x8c
#10 0xffffffff8093acef at bqrelse+0x6f
#11 0xffffffff80af6248 at ffs_read+0x338
#12 0xffffffff80da1182 at VOP_READ_APV+0x92
#13 0xffffffff809629b6 at vn_read+0x166
#14 0xffffffff8095f41a at vn_io_fault+0x23a
#15 0xffffffff809058ab at dofileread+0x7b
#16 0xffffffff809055e5 at kern_readv+0x65
#17 0xffffffff80905573 at sys_read+0x63
Uptime: 10s
Dumping 892 out of 24540 MB:..2%..11%..22%..31%..42%..51%..61%..72%..81%..92%
Reading symbols from /boot/kernel/dummynet.ko.symbols...done.
Loaded symbols for /boot/kernel/dummynet.ko.symbols
Reading symbols from /boot/kernel/pflog.ko.symbols...done.
Loaded symbols for /boot/kernel/pflog.ko.symbols
Reading symbols from /boot/kernel/pf.ko.symbols...done.
Loaded symbols for /boot/kernel/pf.ko.symbols
#0 doadump (textdump=<value optimized out>) at pcpu.h:219
219 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) list *0xffffffff808e0012
0xffffffff808e0012 is in sched_clock (/usr/src/sys/kern/sched_ule.c:2227).
2222 * Handle a stathz tick. This is really only relevant for timeshare
2223 * threads.
2224 */
2225 void
2226 sched_clock(struct thread *td)
2227 {
2228 struct tdq *tdq;
2229 struct td_sched *ts;
2230
2231 THREAD_LOCK_ASSERT(td, MA_OWNED);
As it can be seen there's some odd NIC (bce2) behaviour reported
bce2: Gigabit link up!
bce2: discard frame w/o leading ethernet header (len 0 pkt len 0)
...
which is followed by a crash. The IP points to the ULE Scheduler, though.
The interesting thing that in all of my crash dumps it's only bce2 that reports the discard frame... message. Eventhough I have e.g bce0 NIC (same model) that processess just as much traffic at the same time.
The NIC is: bce2: <Broadcom NetXtreme II BCM5709 1000Base-T (C0)> mem 0xda000000-0xdbffffff irq 32 at device 0.0 on pci2
I've found a somewhat similar bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=168217
However it mentions a different NIC model BCM5717 and it has some more messages in the logs. In my case there's nothing but the discard frame w/o... followed by a crash. So I am not sure if it's related
Here are some of my settings:
/etc/sysctl.conf
Code:
net.inet.tcp.tso=0 # (default 0) - also disabled on the NICs, i.e ifconfig bce2 -tso
kern.ipc.somaxconn=256 # (default 128)
net.link.ether.inet.log_arp_permanent_modify=0 # (default: 1)
net.inet.ip.dummynet.hash_size=65536 # (default: 64)
kern.ipc.maxsockbuf=4194304 # (default 2097152)
net.inet.tcp.sendbuf_max=4194304 # (default 2097152)
net.inet.tcp.recvbuf_max=4194304 # (default 2097152)
net.inet.tcp.sendspace=262144 # (default 32768)
net.inet.tcp.recvspace=262144 # (default 65536)
net.inet.ip.forwarding=1 # (default 0)
net.inet.ip.fastforwarding=1 # (default 0)
net.inet.ip.redirect=0 # (default 1)
kern.random.sys.harvest.ethernet=0 # (default 1)
kern.random.sys.harvest.point_to_point=0# (default 1)
kern.random.sys.harvest.interrupt=0 # (default 1)
/boot/loader.conf
Code:
kern.hz=4000 # (default: 1000) - not sure if it has any effect.
net.link.ifqmaxlen=64 # (default: 50)
dummynet_load="YES"
Would appreciate any suggestions. Thank you.
EDIT: Solved, the culprit was a bad cable. Replaced it with a better CAT5 one.
Last edited: