X10SLH-F kernel panics

C

Hello,

I have a SuperMicro X10SLH-F motherboard that kernel panics. The contents of the core dump are here: http://codepad.org/FIjqh2Nt.

Following the kernel panic troubleshooting instructions in the FreeBSD handbook, I think it is related to one of the following:

Code:

vm_page_updatefake
vm_page_initfake
vm_page_flash
vm_page_remove

but I need confirmation and help on where to go from here. Thanks!

User23

Jan 3, 2014

#2

According to http://www.supermicro.com/support/resources/OS/C226.cfm the board is tested with FreeBSD 9.1.

I would check the RAM and PSU first, before trying to find errors in the kernel. Is the kernel panic reproducible?

OP

C

cryptidfan

Jan 3, 2014

Thread Starter
#3

Thanks for the reply and the suggestions. Yup, compatibility with FreeBSD is one of the reasons I chose this board. I'm using a SuperMicro SC825 chassis with dual 740 W PSU's so I've ~~kinda~~ kind of discounted PSU issues, for now. The panics have occurred while running the following command to exercise the hard drives: dd if=/dev/zero of=/dev/ada[i]n[/i] bs=100m &.

There are eight 4 TB SATA drives in the machine. Four drives are connected to the on-board controller and four are connected to an Adaptec AAR-1430SA PCIe controller. I've had panics running the dd command on all eight drives simultaneously, on just the drives connected to the on-board controller and on just the drives connected to the Adaptec controller. I've been wrestling with this issue off and on for the past several weeks so the fog of war is starting to descend but I think I also had at least one panic when the system was idle. However, that might be a misplaced memory from my troubles trying to use NanoBSD on a USB drive with this board.

The board has 8 GB ECC memory that has successfully passed at least one iteration of Memtest86+. Any further troubleshooting suggestions would be greatly appreciated.

wblock@

Developer

Jan 3, 2014

#4

Even if it has a decent power supply, try a different one.

OP

C

cryptidfan

Jan 3, 2014

Thread Starter
#5

Again, thanks for the advice. Unfortunately for me, the SC825 chassis is the only beast of its kind here and I didn't anticipate having two faulty PSU's out of the box, so I don't have spares yet. They are now on the way. Until they arrive I'll run Memtest for a couple of days, run some more drive exercises and do whatever else I can come up with in an attempt to unearth some hopefully useful faults/clues.

wblock@

Developer

Jan 3, 2014

#6

Sorry, I missed the earlier part where you had described the power supplies. Since they are redundant, it is worth trying each alone.

OP

C

cryptidfan

Jan 4, 2014

Thread Starter
#7

No worries. This is a prototype and one of three that I intend to build so I just went ahead and ordered my next chassis. I'll borrow the PSU's from the new chassis for troubleshooting purposes. Whether or not I use a Supermicro X10SLH-F motherboard remains to be seen. So far the problem motherboard has run Memtest for 19 hours and 15 iterations successfully. This is longer than it ever went without a kernel panic when I ran my drive exercises. If it makes it to Monday I'll consider the memory good and look elsewhere.

K

kpa

Jan 4, 2014

#8

Try the latest release candidate RC4 of FreeBSD 10.0 and see if it makes any difference.

ftp://ftp.freebsd.org/pub/FreeBSD/releases/amd64/amd64/ISO-IMAGES/10.0

OP

C

cryptidfan

Jan 4, 2014

Thread Starter
#9

Thanks, I'll try it on Monday.

OP

C

cryptidfan

Jan 6, 2014

Thread Starter
#10

Memtest ran 62 hours and 47 passes without error. I've scratched memory off my list of suspects. FreeBSD 10.0-RC4/amd64 is installed and I'm running the following command on all eight SATA drives: dd if=/dev/zero of=/dev/ada[i]n[/i] bs=100m &.

I'm waiting to see if it still kernel panics.

OP

C

cryptidfan

Jan 7, 2014

Thread Starter
#11

24 hours uptime and counting, drive exercising completed on all eight drives, no kernel panics. I'm starting to feel guardedly optimistic. FreeBSD 10.0 might be what this motherboard needs. I will update the status in a couple days or sooner if things take a turn for the worse before then.

OP

C

cryptidfan

Jan 7, 2014

Thread Starter
#12

My guarded optimism has been dashed. FreeBSD 10.0 RC4 kernel panics around 27 hours uptime and while running the previously mentioned dd command against all eight SATA drives. Contents of /var/crash/core.txt.0 are here: http://codepad.org/svtRfRT1

Can anyone glean helpful clues, assuming they exist, from the crash file? Is there a more appropriate forum or mailing list for such pleas?

wblock@

Developer

Jan 7, 2014

#13

The freebsd-current mailing list might be able to offer better suggestions.

Terri_Kennedy

Jan 9, 2014

#14

cryptidfan said:
My guarded optimism has been dashed. FreeBSD 10.0 RC4 kernel panic around 27 hours uptime and while running the previously mentioned dd command against all 8 SATA drives. Contents of /var/crash/core.txt.0 are here:

http://codepad.org/svtRfRT1

Can anyone glean helpful clues, assuming they exist, from the crash file? Is there a more appropriate forum or mailing list for such pleas?

In general, if the fault always happens inside the same function (which may or may not have the same physical address, nor the same line number, depending on how the kernel was compiled and any modules loaded), you've found a kernel bug. If the fault happens in random functions without any defined pattern, you have flakey hardware.

Note that not all of the "this should never happen" things in the kernel have nice panic strings, so a piece of broken hardware can give you this type of fault if the kernel developer that handles that piece of code has never seen that particular type of hardware fault. The developer may add a new panic string for the event if they think that panic could happen to other users.

As far as diagnosing the bug that is causing your particular fault, others are a lot better at it than I am. However, something looks odd. Refer to the source file in question. That's the relevant version for 10.0-RC4. Your actual fault was:

Code:

#7  0xffffffff80b21847 in vm_pageout () at /usr/src/sys/vm/vm_pageout.c:1359

But according to the source file I linked to above, the vm_pageout() function occupies source file lines 1629-1697, far away from the line 1359 reported in your fault. 1359 is inside the [much more complicated] vm_pageout_scan() function (source lines 887-1424). This could be for a number of reasons:

The Clang compiler doesn't use the same line numbers as SVN.
There's something inconsistent about the kernel vs. symbol tables on your system.
I have no idea what I'm talking about.

In any event, as @wblock@ suggested, you can probably get more advanced help on the -CURRENT mailing list, as that's where the developers responsible for this code hang out.

Last edited by a moderator: Oct 16, 2014

OP

C

cryptidfan

Jan 9, 2014

Thread Starter
#15

Thanks @wblock@ and @Terry_Kennedy for directing me to the -CURRENT mailing list. I've posted my dilemma on the list but haven't yet had any luck with troubleshooting this any further.

Last edited by a moderator: Oct 16, 2014