It seems that one of my zpools has become corrupted in such a way that It's causing a kernel panic when trying to import the zpool in FreeBSD 14.1-BETA-1, and FreeBSD 14.3-BETA3, and with ZFS on Linux in openSUSE LEAP 15.6. I'm not certain if the kernel panic could become recoverable at any future time?
With ZFS in FreeBSD, I've tested this with a local build of FreeBSD 14.1-BETA-1 and FreeBSD build of FreeBSD 14.3-BETA3. I was able to capture a crash dump when booted to a FreeBSD 14.3-BETA3 installation on a USB drive.
The uname string for the following:
This FreeBSD build was installed directly from base.txz and kernel.txz from official FreeBSD release files.
The section of the crash dump file that shows the kernel panic:
Before the kernel panic, I was able to produce the following sort of debug output with zdb, while booted to FreeBSD on the USB external:
This output from zdb appears to show what might be a usable txg, the second txg illustated above?
With this kernel panic happening during import, though, I'm not able to try to fix the pool on import.
I think that this is probably in the base OpenZFS? The corrupted pool is also causing a kernel panic on import when using a ZFS on Linux build for openSUSE LEAP 15.6 (zfs rpm version 2.3.2-lp156.1.1). (The Linux host is still usable past that point, though it causes mostly a failure for any further usage of ZFS on the Linux host)
In the linux kernel's log output, the panic shows up as follows, when trying to import the corrupted pool:
Up to the point of this corruption in the zpool, I was using the FreeBSD installation on that zpool via VirtualBox on Linux. I'd created a VMDK file for the SSD that the zpool is installed to, having the appropriate udev permissions for the non-root user to access this disk as /dev/sda on Linux. I then created a Virtualbox VM for the VMDK representing the SSD where this pool is installed.
The vmdk was assigned to a vritual AHCI controller in virtualbox, type virtio-scsi. The VMDK file itself was not configured for write-though under VirtualBox, at the time. There may have been a hard powerrof at some point.
The zpool had recently been configured to use edonr as a checksum type. The pool was created with checksum=skein. I'm not sure if the usage of the new checksum kind from OpenZFS 2.2 could or could not be related to it becoming corrupted?
Using a similar VMDK setup, I've tried to fix the pool with OpenZFS in a recent omniOS build. At this time, the omniOS ZFS distribution doesn't appear to support one OpenZFS 2.2 feature that the corrupted zpool is using, mainly vdev_zaps.
With this causing a kernel panic throughout, will I have to call it a loss here?
With ZFS in FreeBSD, I've tested this with a local build of FreeBSD 14.1-BETA-1 and FreeBSD build of FreeBSD 14.3-BETA3. I was able to capture a crash dump when booted to a FreeBSD 14.3-BETA3 installation on a USB drive.
The uname string for the following:
Code:
FreeBSD extblk.cloud.thinkum.space 14.3-BETA3 FreeBSD 14.3-BETA3 releng/14.3-n271394-a2ebf641da89 GENERIC amd64
This FreeBSD build was installed directly from base.txz and kernel.txz from official FreeBSD release files.
The section of the crash dump file that shows the kernel panic:
Code:
panic: VERIFY0(dmu_object_info(os, mapping_object, &doi)) failed (0 == 97)
cpuid = 10
time = 1747985958
KDB: stack backtrace:
#0 0xffffffff80ba8e0d at kdb_backtrace+0x5d
#1 0xffffffff80b5a901 at vpanic+0x161
#2 0xffffffff821adc4a at spl_panic+0x3a
#3 0xffffffff822c0f77 at vdev_indirect_mapping_open+0xe7
#4 0xffffffff82333244 at spa_remove_init+0x154
#5 0xffffffff8228e8ac at spa_load+0x17c
#6 0xffffffff8228e51a at spa_tryimport+0x19a
#7 0xffffffff8235a3aa at zfs_ioc_pool_tryimport+0x3a
#8 0xffffffff82353d0b at zfsdev_ioctl_common+0x53b
#9 0xffffffff821b32dd at zfsdev_ioctl+0x11d
#10 0xffffffff809e63ab at devfs_ioctl+0xcb
#11 0xffffffff80c58878 at vn_ioctl+0xc8
#12 0xffffffff809e6a1e at devfs_ioctl_f+0x1e
#13 0xffffffff80bca075 at kern_ioctl+0x255
#14 0xffffffff80bc9dc1 at sys_ioctl+0x101
#15 0xffffffff8104e4c7 at amd64_syscall+0x117
#16 0xffffffff8102452b at fast_syscall_common+0xf8
Uptime: 4h45m0s
Dumping 2055 out of 64904 MB:
Before the kernel panic, I was able to produce the following sort of debug output with zdb, while booted to FreeBSD on the USB external:
Code:
# zdb -eAAAX mroot
Configuration for import:
vdev_children: 8
hole_array[0]: 1
hole_array[1]: 2
hole_array[2]: 3
hole_array[3]: 4
hole_array[4]: 5
version: 5000
pool_guid: 7617692904694256584
name: 'mroot'
state: 0
vdev_tree:
type: 'root'
id: 0
guid: 7617692904694256584
children[0]:
type: 'disk'
id: 0
guid: 3749122579944117927
whole_disk: 1
metaslab_array: 256
metaslab_shift: 32
ashift: 12
asize: 1974351953920
is_log: 0
DTL: 1043
create_txg: 4
degraded: 1
aux_state: 'err_exceeded'
path: '/dev/gpt/WDR'
children[1]:
type: 'hole'
id: 1
guid: 0
children[2]:
type: 'hole'
id: 2
guid: 0
children[3]:
type: 'hole'
id: 3
guid: 0
children[4]:
type: 'hole'
id: 4
guid: 0
children[5]:
type: 'hole'
id: 5
guid: 0
children[6]:
type: 'missing'
id: 6
guid: 0
children[7]:
type: 'missing'
id: 7
guid: 0
load-policy:
load-request-txg: 18446744073709551615
load-rewind-policy: 24
ASSERT at /usr/src/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c:348:vdev_indirect_mapping_open()
dmu_object_info(os, mapping_object, &doi) == 0 (0x61 == 0)
PID: 1577 COMM: zdb
TID: 100559 NAME:
ZFS_DBGMSG(zdb) START:
spa.c:6626:spa_import(): spa_import: importing mroot, max_txg=-1 (RECOVERY MODE)
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): LOADING
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': best uberblock found for spa mroot. txg 283856481
spa_misc.c:419:spa_load_note(): spa_load(mroot, config untrusted): using uberblock with txg=283856481
vdev.c:2508:vdev_update_path(): vdev_copy_path: vdev 3749122579944117927: vdev_path changed from '/dev/sda3' to '/dev/gpt/WDR'
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading checkpoint txg
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading indirect vdev metadata
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Checking feature flags
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading special MOS directories
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading properties
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading AUX vdevs
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading vdev metadata
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': metaslab_init failed [error=97]
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': vdev_load: metaslab_init failed [error=97]
spa_misc.c:405:spa_load_failed(): spa_load(mroot, config trusted): FAILED: vdev_load failed [error=97]
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): UNLOADING
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): spa_load_retry: rewind, max txg: 283856480
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): LOADING
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': best uberblock found for spa mroot. txg 283850222
spa_misc.c:419:spa_load_note(): spa_load(mroot, config untrusted): using uberblock with txg=283850222
vdev.c:2508:vdev_update_path(): vdev_copy_path: vdev 3749122579944117927: vdev_path changed from '/dev/sda3' to '/dev/gpt/WDR'
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading checkpoint txg
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading indirect vdev metadata
ZFS_DBGMSG(zdb) END
Command exit status: 6
Script done on Thu May 22 19:35:37 2025
This output from zdb appears to show what might be a usable txg, the second txg illustated above?
With this kernel panic happening during import, though, I'm not able to try to fix the pool on import.
I think that this is probably in the base OpenZFS? The corrupted pool is also causing a kernel panic on import when using a ZFS on Linux build for openSUSE LEAP 15.6 (zfs rpm version 2.3.2-lp156.1.1). (The Linux host is still usable past that point, though it causes mostly a failure for any further usage of ZFS on the Linux host)
In the linux kernel's log output, the panic shows up as follows, when trying to import the corrupted pool:
Code:
VERIFY0(dmu_object_info(os, mapping_object, &doi)) failed (0 == 52)
PANIC at vdev_indirect_mapping.c:348:vdev_indirect_mapping_open()
Showing stack for process 2603
CPU: 9 PID: 2603 Comm: zpool Tainted: P OE n 6.4.0-150600.21-default #1 SLE15-SP6 039015c2e8321fe64d2f21cd23bf8b046ab54be1
Hardware name: SYSTEM_MANUFACTURER HX90/HX90, BIOS 5.19 10/11/2021
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x60
spl_panic+0xc8/0x100 [spl 98d94886b30635a864cb10d337d43ed5f4e3a6d1]
? spl_kmem_cache_free+0x9d/0x1e0 [spl 98d94886b30635a864cb10d337d43ed5f4e3a6d1]
? srso_alias_return_thunk+0x5/0x7f
? srso_alias_return_thunk+0x5/0x7f
? aggsum_add+0x177/0x190 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
? srso_alias_return_thunk+0x5/0x7f
? dnode_hold_impl+0x316/0xdc0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
? __kmalloc_node+0x50/0x130
vdev_indirect_mapping_open+0x147/0x180 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
? srso_alias_return_thunk+0x5/0x7f
? spa_config_enter_impl.isra.9+0xbf/0x110 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
spa_remove_init+0xa2/0x1f0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
spa_load+0x137/0x14d0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
spa_load_best+0x18c/0x2d0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
spa_import+0x260/0x660 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
? nvlist_xunpack+0x62/0xb0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
zfs_ioc_pool_import+0x13b/0x150 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
zfsdev_ioctl_common+0x4ad/0x980 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
? srso_alias_return_thunk+0x5/0x7f
? __kmalloc_node+0xbe/0x130
? srso_alias_return_thunk+0x5/0x7f
zfsdev_ioctl+0x4f/0xe0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
__x64_sys_ioctl+0x92/0xd0
do_syscall_64+0x5b/0x80
? exit_to_user_mode_prepare+0x1ed/0x220
? srso_alias_return_thunk+0x5/0x7f
? exit_to_user_mode_prepare+0x1ed/0x220
? srso_alias_return_thunk+0x5/0x7f
? syscall_exit_to_user_mode+0x1e/0x40
? srso_alias_return_thunk+0x5/0x7f
? do_syscall_64+0x67/0x80
? srso_alias_return_thunk+0x5/0x7f
? exc_page_fault+0x69/0x150
entry_SYSCALL_64_after_hwframe+0x77/0xe1
RIP: 0033:0x7f3fd3f29f9b
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007fff49569d70 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fff49569e30 RCX: 00007f3fd3f29f9b
RDX: 00007fff49569e30 RSI: 0000000000005a02 RDI: 0000000000000003
RBP: 00007fff4956dd20 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000246 R12: 000055edaf7dfa70
R13: 000055edaf7e24d0 R14: 00007f3fcc000fe0 R15: 0000000000000000
</TASK>
Up to the point of this corruption in the zpool, I was using the FreeBSD installation on that zpool via VirtualBox on Linux. I'd created a VMDK file for the SSD that the zpool is installed to, having the appropriate udev permissions for the non-root user to access this disk as /dev/sda on Linux. I then created a Virtualbox VM for the VMDK representing the SSD where this pool is installed.
The vmdk was assigned to a vritual AHCI controller in virtualbox, type virtio-scsi. The VMDK file itself was not configured for write-though under VirtualBox, at the time. There may have been a hard powerrof at some point.
The zpool had recently been configured to use edonr as a checksum type. The pool was created with checksum=skein. I'm not sure if the usage of the new checksum kind from OpenZFS 2.2 could or could not be related to it becoming corrupted?
Using a similar VMDK setup, I've tried to fix the pool with OpenZFS in a recent omniOS build. At this time, the omniOS ZFS distribution doesn't appear to support one OpenZFS 2.2 feature that the corrupted zpool is using, mainly vdev_zaps.
With this causing a kernel panic throughout, will I have to call it a loss here?