FreeBSD 13.1 Kernel Panic

I have been regularly getting kernel panics on my machine.
I've set up swap and configured the machine to save the crash dumps.
Code:
root@mymachine:/var/crash # ls -la
total 6035439
drwxr-x---   2 root  wheel          21 Jul 22 08:55 .
drwxr-xr-x  25 root  wheel          25 Jul 22 09:07 ..
-rw-r--r--   1 root  wheel           2 Jul 22 08:54 bounds
-rw-r--r--   1 root  wheel         169 Jul 22 02:21 core.txt.0
-rw-r--r--   1 root  wheel         169 Jul 22 03:15 core.txt.1
-rw-r--r--   1 root  wheel         169 Jul 22 07:29 core.txt.2
-rw-r--r--   1 root  wheel         169 Jul 22 07:56 core.txt.3
-rw-r--r--   1 root  wheel         169 Jul 22 08:55 core.txt.4
-rw-------   1 root  wheel         371 Jul 22 02:20 info.0
-rw-------   1 root  wheel         370 Jul 22 03:13 info.1
-rw-------   1 root  wheel         371 Jul 22 07:28 info.2
-rw-------   1 root  wheel         370 Jul 22 07:55 info.3
-rw-------   1 root  wheel         371 Jul 22 08:54 info.4
lrwxr-xr-x   1 root  wheel           6 Jul 22 08:55 info.last -> info.4
-rw-r--r--   1 root  wheel           5 May 12 09:27 minfree
-rw-------   1 root  wheel  4540502016 Jul 22 02:21 vmcore.0
-rw-------   1 root  wheel  4627173376 Jul 22 03:14 vmcore.1
-rw-------   1 root  wheel  4592214016 Jul 22 07:29 vmcore.2
-rw-------   1 root  wheel  4530495488 Jul 22 07:56 vmcore.3
-rw-------   1 root  wheel  4536147968 Jul 22 08:55 vmcore.4
lrwxr-xr-x   1 root  wheel           8 Jul 22 08:55 vmcore.last -> vmcore.4

root@mymachine:/var/crash # cat core.txt.4
'version' has unknown type; cast it to its declared type
'version' has unknown type; cast it to its declared type
Unable to find matching kernel for /var/crash/vmcore.4

root@mymachine:/var/crash # cat info.last
Dump header from device: /dev/da0p2
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 4536147968
  Blocksize: 512
  Compression: none
  Dumptime: 2022-07-22 08:31:09 +0100
  Hostname: nsgnalpsroot
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 13.1-RELEASE GENERIC
  Panic String: page fault
  Dump Parity: 1801250075
  Bounds: 4
  Dump Status: good


I've tried using kgdb and the output follows:

Code:
root@mymachine:/var/crash # kgdb -n last                                    
kgdb: couldn't find a suitable kernel image
root@mymachine:/var/crash # kgdb /boot/kernel/kernel /var/crash/vmcore.last
GNU gdb (GDB) 11.2 [GDB v11.2 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
(No debugging symbols found in /boot/kernel/kernel)
/usr/ports/devel/gdb/work-py38/gdb-11.2/gdb/thread.c:1345: internal-error: void switch_to_thread(thread_info *): Assertion `thr != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) n

This is a bug, please report it.  For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.

/usr/ports/devel/gdb/work-py38/gdb-11.2/gdb/thread.c:1345: internal-error: void switch_to_thread(thread_info *): Assertion `thr != NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Create a core file of GDB? (y or n) n
Command aborted.
(kgdb)

So I see two issue here:
  1. No debugging symbols found in /boot/kernel/kernel
  2. /usr/ports/devel/gdb/work-py38/gdb-11.2/gdb/thread.c:1345: internal-error: void switch_to_thread(thread_info *): Assertion `thr != NULL' failed.
Are they related? IE: having no debug symbols causes the other error?

How do I get the debug symbols?
 
I have been regularly getting kernel panics on my machine.
Right at boot or apparently randomly while the machine is running? Do you know what is the machine doing when it crashes? Knowing what it's doing would help narrow down the potential culprit.

No debugging symbols found in /boot/kernel/kernel
Did you install the kernel-dbg component?
 
Right at boot or apparently randomly while the machine is running? Do you know what is the machine doing when it crashes? Knowing what it's doing would help narrow down the potential culprit.


Did you install the kernel-dbg component?
While its running. I see this in the console (Please note that this image was taken from FreeBSD 13.0 running in a VM but the issue is the same)
1658479793134.png


There are lots of processes that are creating/destroying zfs snapshots. Looking at the previous it seems to be in the area.

What is the kernel-dbg component?
 
What is the kernel-dbg component?
At a certain stage the installer asks which components to install, base, base-dbg, lib32, lib32-dbg, ports, etc. kernel-dbg is one of those components you can select. That component has all the debug symbols and other files you need for debugging the kernel.

Just some general tips when dealing with seemingly random crashes, check your memory for errors and check the SMART status of your disks. Memory errors could certainly lead to weird and random crashes. Certain read errors from the disk(s) could also have disastrous consequences. So it's always good to check those, even if it's only to rule them out as a possible cause.
 
At a certain stage the installer asks which components to install, base, base-dbg, lib32, lib32-dbg, ports, etc. kernel-dbg is one of those components you can select. That component has all the debug symbols and other files you need for debugging the kernel.

Just some general tips when dealing with seemingly random crashes, check your memory for errors and check the SMART status of your disks. Memory errors could certainly lead to weird and random crashes. Certain read errors from the disk(s) could also have disastrous consequences. So it's always good to check those, even if it's only to rule them out as a possible cause.
Ah OK. Is it possible to add that after install? Grab hold of the file and uncompress it into place?

I switched on the memory check during boot and have also ran Memtest86. Neither reported errors. I'll look at the SMART status.

The machine is a DELL server with ECC Ram.
 
Grab hold of the file and uncompress it into place?
Yes, that will work. There haven't been any patch updates for 13.1 yet so the files from the installation media should still be valid.

The machine is a DELL server with ECC Ram.
Ok, nice. ECC would certainly complain if there were any issues (you will get MCA warnings/errors in /var/log/messages).
 
Here is the output of kgdb:

Code:
GNU gdb (GDB) 11.2 [GDB v11.2 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd13.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 8; apic id = 08
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe0259f2e588
frame pointer           = 0x28:0xfffffe0259f2e5a0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 34934 (rover)
trap number             = 12
panic: page fault
cpuid = 8
time = 1658507833
KDB: stack backtrace:
#0 0xffffffff80c69465 at kdb_backtrace+0x65
#1 0xffffffff80c1bb1f at vpanic+0x17f
#2 0xffffffff80c1b993 at panic+0x43
#3 0xffffffff810afdf5 at trap_fatal+0x385
#4 0xffffffff810afe4f at trap_pfault+0x4f
#5 0xffffffff81087528 at calltrap+0x8
#6 0xffffffff80cf8ab6 at vgonel+0x186
#7 0xffffffff80cf9171 at vgone+0x31
#8 0xffffffff80ce799d at vfs_hash_insert+0x26d
#9 0xffffffff82180069 at sfs_vgetx+0x149
#10 0xffffffff82180c54 at zfsctl_snapdir_lookup+0x1e4
#11 0xffffffff80ce9bbc at lookup+0x45c
#12 0xffffffff80ce8de9 at namei+0x259
#13 0xffffffff80d06953 at kern_statat+0xf3
#14 0xffffffff80d0704f at sys_fstatat+0x2f
#15 0xffffffff810b06ec at amd64_syscall+0x10c
#16 0xffffffff81087e3b at fast_syscall_common+0xf8
Uptime: 3h48m1s
Dumping 5730 out of 130655 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55      /usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory.
(kgdb) bt
#0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399
#2 0xffffffff80c1b71c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487
#3 0xffffffff80c1bb8e in vpanic (fmt=0xffffffff811b4fb9 "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920
#4 0xffffffff80c1b993 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844
#5 0xffffffff810afdf5 in trap_fatal (frame=0xfffffe0259f2e4c0, eva=0) at /usr/src/sys/amd64/amd64/trap.c:944
#6 0xffffffff810afe4f in trap_pfault (frame=0xfffffe0259f2e4c0, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763
#7 <signal handler called>
#8 0x0000000000000000 in ?? ()
#9 0xffffffff8117c26c in VOP_CLOSE_APV (vop=0xffffffff8244da70 <zfsctl_ops_snapshot>, a=a@entry=0xfffffe0259f2e5b0) at vnode_if.c:498
#10 0xffffffff80cf8ab6 in VOP_CLOSE (vp=0xfffff81e88fdc988, fflag=4, cred=0x0, td=0xfffffe02054e8e40) at ./vnode_if.h:249
#11 vgonel (vp=vp@entry=0xfffff81e88fdc988) at /usr/src/sys/kern/vfs_subr.c:4088
#12 0xffffffff80cf9171 in vgone (vp=vp@entry=0xfffff81e88fdc988) at /usr/src/sys/kern/vfs_subr.c:3963
#13 0xffffffff80ce799d in vfs_hash_insert (vp=0xfffff81e88fdc988, hash=136416, hash@entry=1509092400, flags=flags@entry=2097152, td=<optimized out>, td@entry=0xfffffe02054e8e40, vpp=vpp@entry=0xfffffe0259f2ec30, fn=<optimized out>, arg=0xfffff815afc90300) at /usr/src/sys/kern/vfs_hash.c:181
#14 0xffffffff82180069 in sfs_vnode_insert (vp=0xfffffe0259f2e5b0, flags=2097152, vpp=0xfffffe0259f2ec30, parent_id=<optimized out>, id=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_ctldir.c:152
#15 sfs_vgetx (mp=0xfffffe02494c8ac0, flags=flags@entry=2097152, parent_id=parent_id@entry=2, id=<optimized out>, tag=<optimized out>, vops=0xffffffff8244da70 <zfsctl_ops_snapshot>, setup=0xffffffff82181360 <zfsctl_snapshot_vnode_setup>, arg=0xfffffe0259f2e820, vpp=0xfffffe0259f2ec30)
at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_ctldir.c:200
#16 0xffffffff82180c54 in zfsctl_snapdir_lookup (ap=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_ctldir.c:954
#17 0xffffffff80ce9bbc in VOP_LOOKUP (dvp=0xfffff80be30a4988, vpp=0xfffffe0259f2ec30, cnp=0xfffffe0259f2ec58) at ./vnode_if.h:65
#18 lookup (ndp=ndp@entry=0xfffffe0259f2ebd8) at /usr/src/sys/kern/vfs_lookup.c:1086
#19 0xffffffff80ce8de9 in namei (ndp=ndp@entry=0xfffffe0259f2ebd8) at /usr/src/sys/kern/vfs_lookup.c:616
#20 0xffffffff80d06953 in kern_statat (td=0xfffffe02054e8e40, flag=<optimized out>, fd=-100, path=0x0, pathseg=(unknown: 0x54e9350), pathseg@entry=UIO_USERSPACE, sbp=sbp@entry=0xfffffe0259f2ed18, hook=0x0) at /usr/src/sys/kern/vfs_syscalls.c:2441
#21 0xffffffff80d0704f in sys_fstatat (td=0xfffffe0259f2e5b0, uap=0xfffffe02054e9228) at /usr/src/sys/kern/vfs_syscalls.c:2418
#22 0xffffffff810b06ec in syscallenter (td=0xfffffe02054e8e40) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189
#23 amd64_syscall (td=0xfffffe02054e8e40, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1185
#24 <signal handler called>
#25 0x000000080134139a in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffde18
(kgdb)
 
i.am.the.problem I recommend that you file a FreeBSD problem report with the information from your latest comment.
It looks like there could have been some changes in the VFS layer code that broke expectations of the ZFS code for .zfs support.
 
Back
Top