Panic:sbdrop crash

hardlivinlow · Jun 5, 2017

I'm running FreeNAS-9.10.2-U4. FreeBSD 10.3.

Original thread over on FreeNAS forums about this issue.
https://forums.freenas.org/index.php?threads/system-crash-on-smb-file-copy.54971/

Issue
Getting kernel panics copying to a SMB share. or when I dd and run iperf at the same time.

Crash Dump File

http://www.filedropper.com/textdumptar2

Code:

[noparse]
panic: sbdrop
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe09504456b0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0950445760
vpanic() at vpanic+0x126/frame 0xfffffe09504457a0
panic() at panic+0x43/frame 0xfffffe0950445800
sbcut_internal() at sbcut_internal+0x273/frame 0xfffffe0950445810
sbflush_internal() at sbflush_internal+0x18/frame 0xfffffe0950445830
sbdestroy() at sbdestroy+0x12/frame 0xfffffe0950445850
sofree() at sofree+0x19d/frame 0xfffffe0950445880
soclose() at soclose+0x376/frame 0xfffffe09504458d0
_fdrop() at _fdrop+0x29/frame 0xfffffe09504458f0
closef() at closef+0x21e/frame 0xfffffe0950445980
fdescfree() at fdescfree+0x4f9/frame 0xfffffe0950445a30
exit1() at exit1+0x581/frame 0xfffffe0950445ac0
sys_sys_exit() at sys_sys_exit+0xe/frame 0xfffffe0950445ad0
amd64_syscall() at amd64_syscall+0x41e/frame 0xfffffe0950445bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0950445bf0
--- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x80514ca9a, rsp = 0x7fffffffdf88, rbp = 0x7fffffffdfa0 ---
KDB: enter: panic
[/noparse]

Hardware
Dell PowerEdge R710 V2
Xeon X5560 Quad Core
28GB Ram
Dell Perc H310 HBA (SAS9211-8i Firmware: 20.00.07.00 IT Mode)
Intel PRO/1000 PT Quad Port

Tshooting steps taken.
All bios firmware are up to date on all hardware.
Ran dell diagnostics on all hardware. (Clear)
Installed Intel nic after issue was happening with broadcom on-board nics.
Ran memtest86 for 7 hours (no errors)
Changed disk drives in server
Changed HBA
Changed OS disk and slot. Reinstalled multiple times.
Tried different ram
Bypassed network switch
Replaced Ethernet cable
Replaced PSU
System temperatures are good.

I'm really out of options and wanted to reach out to someone here for help as the community over at FreeNAS are out of ideas as well... Any help identifying this issue would be greatly appreciated as I have been on this for about 5 days now.

SirDice · Jun 7, 2017

I only found references to a bug that's been solved a long time ago. I suggest opening a PR for it.

Terry_Kennedy · Jun 8, 2017

hardlivinlow said:
Getting kernel panics copying to a SMB share. or when I dd and run iperf at the same time.

Crash Dump File

http://www.filedropper.com/textdumptar2

I find that file, as well as the 2 you linked in the FreeNAS forum, corrupted. File type is .tar.gz but data is uncompressed, and tar complains about an unexpected EOF. Downloaded multiple times with different browsers and tested with tar(1) and archivers/gtar.

Code:

[noparse]
panic: sbdrop
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe09504456b0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0950445760
vpanic() at vpanic+0x126/frame 0xfffffe09504457a0
panic() at panic+0x43/frame 0xfffffe0950445800
sbcut_internal() at sbcut_internal+0x273/frame 0xfffffe0950445810
sbflush_internal() at sbflush_internal+0x18/frame 0xfffffe0950445830
sbdestroy() at sbdestroy+0x12/frame 0xfffffe0950445850
sofree() at sofree+0x19d/frame 0xfffffe0950445880
soclose() at soclose+0x376/frame 0xfffffe09504458d0
[/noparse]

I think you can ignore the various "sacrifice a rubber chicken" suggestions from the other forum. You have reputable hardware with known stable firmware versions, and you seem to be getting a consistent panic in the same place from the same thing, which generally rules out hardware problems.

Try these three things for me, after getting your system back into a non-corrupted state (fix the label errors, etc.):

1) Disable all network-related stuff in /etc/rc.conf - that would be all of the obvious networking stuff as well as various services that might be started that are network-related. Reboot and run your dd(1) test again. As noted elsewhere, /dev/random is slow, so to really stress the hardware, use a previously-generated file with random(ish) data - the FreeBSD or FreeNAS ISO or other install image is probably a good place to start.

2) If the first test does not fail, re-enable all of the stuff you disabled and reboot the system with all network cables disconnected from it. Run the test again.

3) If the second test does not fail, bring the system back up with the network cable reconnected, but remove any non-default options from your ifconfig_whatever in /etc/rc.conf such as mtu 9000. Before running the test, use ifconfig(8) to disable any hardware acceleration options like checksum offload, TSO, etc. Post your before and after ifconfig(8) output. This is likely not going to help anything, but try it anyway if the first 2 tests pass.

Usable crash dumps will also be helpful. Even better would be the kernel and corefile, but those will be large and may reveal data you consider sensitive, so let's hold off on that for now. If you're up to rebuilding the kernel from source, this patch (against HEAD, so the line numbers will be different on your system) changes the panic into a KASSERT(9). You probably need options INVARIANTS as well. Note that INVARIANTS may mask the problem entirely, with or without the patch.

hardlivinlow · Jun 16, 2017

Wanted to let you all know I changed the CPUs in the server and the problem is resolved. One CPU was discolored on the bottom so that leads me to believe that CPU was the culprit, weird because the thermal paste was still good and was transferring heat properly. *shrugs. Diagnostics wasn't showing failure, but that indeed took care of the panics. I will note down these steps for future tshooting. Thanks for the suggestions. Much appreciated

Panic:sbdrop crash

hardlivinlow

SirDice

Administrator

Terry_Kennedy

hardlivinlow