Kernel panic on 11.2-RELEASE-p7

Jurij Kovacic · Dec 28, 2018

Dear all,

This morning one of our (physical) production servers (Freebsd 11.2-RELEASE-p7 with GENERIC kernel, ZFS root) experienced (another) kernel panic:

Code:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer    = 0x20:0xffffffff82299013
stack pointer            = 0x28:0xfffffe0352893ad0
frame pointer            = 0x28:0xfffffe0352893b10
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 9 (dbuf_evict_thread)
trap number        = 9
panic: general protection fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80b3d577 at kdb_backtrace+0x67
#1 0xffffffff80af6b17 at vpanic+0x177
#2 0xffffffff80af6993 at panic+0x43
#3 0xffffffff80f77fdf at trap_fatal+0x35f
#4 0xffffffff80f7759e at trap+0x5e
#5 0xffffffff80f5808c at calltrap+0x8
#6 0xffffffff8229c049 at dbuf_evict_one+0xe9
#7 0xffffffff82297a15 at dbuf_evict_thread+0x1a5
#8 0xffffffff80aba093 at fork_exit+0x83
#9 0xffffffff80f58fae at fork_trampoline+0xe

I have used the "crashinfo" utility to (again) generate the text file which is available at this URL: http://www.ocpea.com/dump/core-2.txt
There is also a text file of the previous dump available at http://www.ocpea.com/dump/core.txt

Does anyone have any idea how we can go about discovering the cause for this? We would appreciate any suggestion ...

Kind regards,
Jurij Kovacic

SirDice · Dec 28, 2018

If I'm not mistaken dbuf_evict_thread hints at issues with ZFS. Does the panic always happen with dbuf_evict_thread? Besides using it as your root filesystem how is your ZFS configured? Dedup? Compression? Pool with multiple disks, ZIL, L2ARC, etc?

Jurij Kovacic · Dec 28, 2018

Hi,
Thank you for your reply.

The panic happened twice - it seems it does happen with dbuf_evict... I am not using deduplication, compression is enabled on all pools.

Code:

/etc/sysctl.conf
...
# set 4k sectors
vfs.zfs.min_auto_ashift=12
....

This is my ZFS configuration:

Code:

  pool: zdata
 state: ONLINE
  scan: scrub repaired 0 in 2h9m with 0 errors on Fri Dec 28 12:43:40 2018
config:

        NAME             STATE     READ WRITE CKSUM
        zdata            ONLINE       0     0     0
          gpt/data-disk  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 0h55m with 0 errors on Fri Dec 28 11:30:19 2018
config:

        NAME              STATE     READ WRITE CKSUM
        zroot             ONLINE       0     0     0
          mirror-0        ONLINE       0     0     0
            gpt/zfs0      ONLINE       0     0     0
            gpt/zfs1      ONLINE       0     0     0
        spares
          gpt/spare-disk  AVAIL

I am now in the process of rebuilding kernel with debug symbols to get better diagnostic information (as suggested on freebsd-stable mailing list).

Kind regards,
Jurij

SirDice · Dec 28, 2018

To be honest I have absolutely no idea what the issue could be. Have you checked for the obvious culprits like dodgy disks? No guarantee it's going to find all issues but it's usually a good idea to look at the SMART data of each drive. You can use sysutils/smartmontools for this.

Jurij Kovacic · Dec 29, 2018

Hi,

I took your advice and installed smartmontools; there were/are no errors logged on any of the disks:

Code:

...
SMART Error Log Version: 1
No Errors Logged
...

Nothing changed after running short tests.

There was one attribute (read error count) on one of the disks, but it is fairly low:

Code:

...
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       10
  3 Spin_Up_Time            0x0027   120   119   021    Pre-fail  Always       -       6975
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       22
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       31434
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       20
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       9
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1679
194 Temperature_Celsius     0x0022   119   114   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
....

I am running long disk checks at the moment - I will see if something turns out.

Thank you for the advice!

Regards,
Jurij

PMc · Dec 29, 2018

Okay, any suggestion

In the past I have seen such kind of error, and it was -most likely- caused from rather subtle memory issues, although memtest didn't complain.
You might check if Your mem is certified for the board, and -if spare stuff available- try to swap it.

Jurij Kovacic · Dec 29, 2018

Hi PMc,

Thank you for the suggestion. I will soon have one additional (identical) server (in terms of OS and hardware) at my disposal, so if this is a hardware issue, the other server should not have these stability problems.

Looking at output of dmidecode:

Code:

Handle 0x0025, DMI type 16, 15 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: None
        Maximum Capacity: 128 GB
        Error Information Handle: Not Provided
        Number Of Devices: 4

Handle 0x0026, DMI type 19, 15 bytes
Memory Array Mapped Address
        Starting Address: 0x00000000000
        Ending Address: 0x0031FFFFFFF
        Range Size: 12800 MB
        Physical Array Handle: 0x0025
        Partition Width: 1

Handle 0x0027, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x0025
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 2048 MB
        Form Factor: DIMM
        Set: None
        Locator: CPU1_DIMM0
        Bank Locator: BANK0
        Type: Other
        Type Detail: Synchronous
        Speed: 667 MT/s
        Manufacturer: Apacer
        Serial Number: 32110102
        Asset Tag: AssetTagNum0
        Part Number: 78.A1GDE.9K00C

it seems to me RAM is not ECC type (https://eu.mouser.com/datasheet/2/24/78.B2GC9.AF1-1021629.pdf). These kind of errors (if they are caused by faulty RAM) would probably be non-existent with ECC RAM?

Regards,
Jurij

_martin · Dec 29, 2018

@Jurij Well, ECC is not magic so these errors can be there too. It can detect (and correct) errors to a certain degree. But then it's usually easier to see that.
Hopefully somebody on mailing list gave you some better info on this GPF from core files.

I saw you were trying to hide your public IP in coreX.txt files - you missed the broadcast on em0 there

.

PMc · Dec 30, 2018

As _martin already stated, ECC is not a cure-all. Basically, ECC protects from two things:
* memory cell has flipped due to a cosmic ray. That happens once a year or so.
* memory cell tends to flip because it is weak/defective. That should be recognized with memtest.

But then there are those two dozen or so timing parameters which are specific to a memory chip. And the SPD should tell the mainboard these parameters - but there's a lot of them, and well, who knows?
If there are problems in that realm (and there can be two reasons: a weak chip, or a memory model that is not perfectly fit for the board, or for the current BIOS settings), ECC won't help. Because here we do not have a flipped cell, here we have a signal that is still in it's rising slope, so that it is not yet fully clear if it's a 0 or a 1.

What I have seen is that ZFS is the first to complain about such things (that is, occasional kernel crash), e.g. when trying to use mixed brands of memory (which should be handled by SPD). It seems ZFS beats the memory quite hard.

Ideally I would prefer to use RegECC mem with ZFS, because the "registed" feature takes load off the chips. But You also need a board that can do that.

ralphbsz · Dec 30, 2018

PMc said:
What I have seen is that ZFS is the first to complain about such things (that is, occasional kernel crash), e.g. when trying to use mixed brands of memory (which should be handled by SPD). It seems ZFS beats the memory quite hard.

It's not just ZFS, it's all file systems. It makes logical sense: Most of the memory that's used by the kernel is the "disk" or "buffer" cache, which is copies of data that were recently read or written, data that needs to be written soon, and metadata that tells the file system where the data is on disk. On large production machines that act as disk or file servers, I've seen situations where over 90% of the physical memory is used as file system cache, and this is a good thing (makes for great performance). But this also means that if something goes wrong with the memory hardware, the file system will be the first to know about it. In our department at work (we were file system implementors), we used to talk about being the "canary in the mine", the first thing to die if the situation gets bad. Unfortunately, people tend to blame the victim: no, it's not the canary's fault that the air is so poisonous, and you are not coughing because the bird fell off its perch: correlation is not causation. Many file system implementors are sick and tired of being accused of having bugs, when in reality they are innocent. For this reason, file system people tend to agitate that all systems should use ECC: then if there is a memory error, the message comes from the BIOS or motherboard, not from the file system, and we have "plausible deniability". Another thing that some high-end file systems have started doing is to add checksums on the most important or largest data structures they keep in RAM. Like that, they can actually cleanly detect memory corruption, and give meaningful messages, instead of crashing in bizarre fashions.

Speaking of checksums: ZFS already supports checksums on disk, which protects both the data on the disk drive (at rest), and while being transmitted from the host to/from the disk (in flight). This is really good, and a hallmark of high-quality file systems today. However, it also increases the psychological pressure on using ECC: Having fixed the biggest source of data corruption (namely the disk drive), it means that ZFS users would benefit relatively more from fixing data corruption in RAM, which is what ECC is for.

Good luck with your crash! Hopefully it goes away on the other system.

Jurij Kovacic · Dec 30, 2018

_martin : Thank you for the heads-up regarding the IP address - I am not a believer in "security trough obscurity", but still ...

PMc : It seems to me that at the time of purchase of the said server, top quality was not the main criteria. Nice to know such thing as RegECC mem exists though - it is quite evident I am not too familiar with hardware as such.

ralphbsz : Thank you very much for your insight. It seems to me that ideally, there should be checksums for buffer cache in memory as well. As a means of "protecting" memory, "stack protector" comes to mind ... I know that this is a different concept, bu to me, it shows that data in memory needs protecting - for a variety of different reasons.

Jurij Kovacic · Jan 5, 2019

Dear all,

About a week ago, we had a kernel panic on Freebsd 11.2-RELEASE-p7 with GENERIC kernel, ZFS root. As the kernel was not compiled with debug support enabled, the resulting "vmcore" files were of little use. Consequently, I recompiled kernel with debug support:

Code:

--- GENERIC     2018-12-29 08:03:04.786846000 +0100
+++ DEBUG       2018-12-29 08:23:36.522966000 +0100
@@ -19,11 +19,16 @@
 # $FreeBSD: releng/11.2/sys/amd64/conf/GENERIC 333417 2018-05-09 16:14:12Z sbruno $

 cpu            HAMMER
-ident          GENERIC
+ident          DEBUG

 makeoptions    DEBUG=-g                # Build kernel with gdb(1) debug symbols
 makeoptions    WITH_CTF=1              # Run ctfconvert(1) for DTrace support

+# kernel debugging
+options                KDB
+options                KDB_UNATTENDED
+options                KDB_TRACE
+
 options        SCHED_ULE               # ULE scheduler
 options        PREEMPTION              # Enable kernel thread preemption
 options        INET                    # InterNETworking

and installed it.

After running for about a week, the server crashed again this night. Unfortunately, there are no "vmcore" files on "/var/crash" this time.

The server has 12GB of RAM installed:

Code:

 # sysctl hw.physmem
hw.physmem: 12843053056

and uses 2 swap partitions (2G each):

Code:

# swapinfo -h
Device          1K-blocks     Used    Avail Capacity
/dev/ada0p2       2097152     642M     1.4G    31%
/dev/ada1p2       2097152     638M     1.4G    31%
Total             4194304     1.3G     2.7G    31%

Dump device is set in /etc/rc.conf:

Code:

# grep dump /etc/rc.conf
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"

There seems to be enough space left in "/var/crash":

Code:

# zfs list | grep crash
zroot/var/crash      857M  17.2G   857M  /var/crash

and like I said earlier, the system DID create "vmcore" files when crashing with GENERIC kernel. Is it possible that swap partition(s) are too small for the memory dump, now that the kernel is compiled with debug support? Or is some additional configuration needed to make the system save vmcore files?

Any suggestions?

Kind regards,
Jurij

PMc · Jan 5, 2019

From a short glance, I would say it should write a kernel dump - that means, I have the same KDB options and mine tries do dump.
I never tried what happens with swap smaller than physmem (dump can only use one of Your swaps), so this might be a cause.

Another thing I have seen is that the dump fails with I/O error. This is reported to console, and obviousely there will be no crash-file afterwards. I don't know why exactly this happens, but it may be that the crash was caused from the disk-I/O system, which then may be in too bad a shape to write the dump. So I have configured this option:

Code:

options         PANIC_REBOOT_WAIT_TIME=3600

This gives me an hour between crash and reboot, so that I may at least have a chance to look at the console what is reported there.

Jurij Kovacic said:
There seems to be enough space left in "/var/crash":

This would be reported if it were the issue. If no kernel dump was written, you see this message in the console.log from startup:

Code:

kernel: No core dumps found.

If there was a dump, but it could not be copied, You see a different message.

VladiBG · Jan 5, 2019

Maybe you need UFS partition where to save the dump.

achix · Jan 5, 2019

Hello and HPNY, although I have much less modern HW knowledge than some of the members above, in my (some decades) of running FreeBSD, every single panic was a result of bad memory. This thread is a great read btw.

Jurij Kovacic · Feb 10, 2019

Hi,

After a short period, our server (Freebsd 11.2-RELEASE-p7 with custom DEBUG kernel, ZFS root) crashed again. Before the crash, I created a new swap partition which was large enough to hold the dump, so I was able to obtain the core dump, "crash info" of which is available at this URL: http://www.ocpea.com/dump/core-3.txt

If anyone can help with debugging the dump, I would appreciate the help ...

Regards,
Jurij

VladiBG · Feb 10, 2019

remember that the contents of /var/crash is sensitive and very likely contains confidential information such as passwords.

Jurij Kovacic · Feb 10, 2019

Hi VladiBG,

Thank you for the heads-up - I have removed what I considered sensitive information from the .txt file - I take it I should have made a better job of it?

Kernel panic on 11.2-RELEASE-p7

Jurij Kovacic

SirDice

Administrator

Jurij Kovacic

SirDice

Administrator

Jurij Kovacic

PMc

Jurij Kovacic

_martin

PMc

ralphbsz

Jurij Kovacic

Jurij Kovacic

PMc

VladiBG

achix

Jurij Kovacic

VladiBG

Jurij Kovacic