Solved Kernel panic twice in 24 hours. Hardware failure, or bug?

Hello,

In short:
Code:
root@datacore_tmp:~ # uname -a
FreeBSD datacore_tmp.deltanews.lan 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015  root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
/GENERIC  amd64

I'm using it with istgt for iSCSI storage. It's fresh install from yesterday and no additional software was installed except the istgt daemon.

It's running on zfs root and the iSCSI is providing a zvol device.

Last night I left it with some data transfer and in the morning I find it in a black monitor, no kernel panic message, or any other indicator. First I thought it's a hardware issue, but then I started it again. With no data transfer it ran normally about 2 hours and then I started some data transfer and putted in ssh session a
Code:
tail -f /var/log/messages
===cut useless stuff===
Aug 28 09:04:14 datacore_tmp su: user to root on /dev/pts/0
Aug 28 09:12:31 datacore_tmp kernel: kernel trap 12 with interrupts disabled
Aug 28 09:12:31 datacore_tmp kernel:
Aug 28 09:12:31 datacore_tmp kernel:
Aug 28 09:12:31 datacore_tmp kernel: Fatal trap 12: page fault while in kernel mode
Aug 28 09:12:31 datacore_tmp kernel: cpuid = 4; apic id = 04
Aug 28 09:12:31 datacore_tmp kernel: fault virtual address   = 0x188
Aug 28 09:12:31 datacore_tmp kernel: fault code     = supervisor read data, page not present
Write failed: Broken pipe

So my question: Could it be a hardware issue with one of the CPUs, or it is more likely a bug? I'm sure it's hard do say just by the look of these message, but I have this in rc.conf:
Code:
dumpdev="AUTO"
Honestly I have no idea where to find the dump files, how to extract and analyze them.

Could anybody advise me how to proceed?

Thank you.
 
Last edited by a moderator:
May be completely unrelated but try limiting the ZFS ARC size to leave a couple of GB spare if you haven't already.
I seem to be able to panic a system pretty easily by letting ZFS use up all the RAM for ARC, then running a big backup.
 
Hello,

I found out that I have the same hardware no another system running FreeBSD 10.1 also used with istgt and zvol and never had any issues.
So I reinstalled to 10.1 and after exactly 10 minutes uptime I got kernel panic again. I'm running memory tests now and so far no errors. I wonder if I should try replacing a CPU, because I have a spare one.

Update:

After 10-12 more kernel panics with replaced memory and CPUs I got it to a service center and when they plugged the motherboard and turn it on it start immediately smelling like a burnout chip :D
Motherboard replaced and now running 4 hours with no issues. Over 500Gbs data transferred and no issues.

usdmatt, thank you for the suggestion, I will keep this in mind as an option to try next time I have issues. It just wasn't this one.
By the way I'm running a lot of storage stuff on FreeBSD on zvols using iSCSI and with massive write data. Even using it on few systems with 10GB interfaces for vmware storage, almost with no tuning and I'm sure I never limit the ZFS ARC size and never had any kernel panics. I think you should check if this one is related somehow to your hardware or drivers.

Thanks.
 
Last edited:
Back
Top