ZFS Solved - Kernel panic with OpenZFS 2.2, FreeBSD 14.3-BETA3 and ZoL on openSUSE

neogeo · May 23, 2025

It seems that one of my zpools has become corrupted in such a way that It's causing a kernel panic when trying to import the zpool in FreeBSD 14.1-BETA-1, and FreeBSD 14.3-BETA3, and with ZFS on Linux in openSUSE LEAP 15.6. I'm not certain if the kernel panic could become recoverable at any future time?

With ZFS in FreeBSD, I've tested this with a local build of FreeBSD 14.1-BETA-1 and FreeBSD build of FreeBSD 14.3-BETA3. I was able to capture a crash dump when booted to a FreeBSD 14.3-BETA3 installation on a USB drive.

The uname string for the following:

Code:

FreeBSD extblk.cloud.thinkum.space 14.3-BETA3 FreeBSD 14.3-BETA3 releng/14.3-n271394-a2ebf641da89 GENERIC amd64

This FreeBSD build was installed directly from base.txz and kernel.txz from official FreeBSD release files.

The section of the crash dump file that shows the kernel panic:

Code:

panic: VERIFY0(dmu_object_info(os, mapping_object, &doi)) failed (0 == 97)
cpuid = 10
time = 1747985958
KDB: stack backtrace:
#0 0xffffffff80ba8e0d at kdb_backtrace+0x5d
#1 0xffffffff80b5a901 at vpanic+0x161
#2 0xffffffff821adc4a at spl_panic+0x3a
#3 0xffffffff822c0f77 at vdev_indirect_mapping_open+0xe7
#4 0xffffffff82333244 at spa_remove_init+0x154
#5 0xffffffff8228e8ac at spa_load+0x17c
#6 0xffffffff8228e51a at spa_tryimport+0x19a
#7 0xffffffff8235a3aa at zfs_ioc_pool_tryimport+0x3a
#8 0xffffffff82353d0b at zfsdev_ioctl_common+0x53b
#9 0xffffffff821b32dd at zfsdev_ioctl+0x11d
#10 0xffffffff809e63ab at devfs_ioctl+0xcb
#11 0xffffffff80c58878 at vn_ioctl+0xc8
#12 0xffffffff809e6a1e at devfs_ioctl_f+0x1e
#13 0xffffffff80bca075 at kern_ioctl+0x255
#14 0xffffffff80bc9dc1 at sys_ioctl+0x101
#15 0xffffffff8104e4c7 at amd64_syscall+0x117
#16 0xffffffff8102452b at fast_syscall_common+0xf8
Uptime: 4h45m0s
Dumping 2055 out of 64904 MB:

Before the kernel panic, I was able to produce the following sort of debug output with zdb, while booted to FreeBSD on the USB external:

Code:

# zdb -eAAAX mroot

Configuration for import:
        vdev_children: 8
        hole_array[0]: 1
        hole_array[1]: 2
        hole_array[2]: 3
        hole_array[3]: 4
        hole_array[4]: 5
        version: 5000
        pool_guid: 7617692904694256584
        name: 'mroot'
        state: 0
        vdev_tree:
            type: 'root'
            id: 0
            guid: 7617692904694256584
            children[0]:
                type: 'disk'
                id: 0
                guid: 3749122579944117927
                whole_disk: 1
                metaslab_array: 256
                metaslab_shift: 32
                ashift: 12
                asize: 1974351953920
                is_log: 0
                DTL: 1043
                create_txg: 4
                degraded: 1
                aux_state: 'err_exceeded'
                path: '/dev/gpt/WDR'
            children[1]:
                type: 'hole'
                id: 1
                guid: 0
            children[2]:
                type: 'hole'
                id: 2
                guid: 0
            children[3]:
                type: 'hole'
                id: 3
                guid: 0
            children[4]:
                type: 'hole'
                id: 4
                guid: 0
            children[5]:
                type: 'hole'
                id: 5
                guid: 0
            children[6]:
                type: 'missing'
                id: 6
                guid: 0
            children[7]:
                type: 'missing'
                id: 7
                guid: 0
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 24
ASSERT at /usr/src/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c:348:vdev_indirect_mapping_open()
dmu_object_info(os, mapping_object, &doi) == 0 (0x61 == 0)
  PID: 1577      COMM: zdb
  TID: 100559    NAME:

ZFS_DBGMSG(zdb) START:
spa.c:6626:spa_import(): spa_import: importing mroot, max_txg=-1 (RECOVERY MODE)
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): LOADING
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': best uberblock found for spa mroot. txg 283856481
spa_misc.c:419:spa_load_note(): spa_load(mroot, config untrusted): using uberblock with txg=283856481
vdev.c:2508:vdev_update_path(): vdev_copy_path: vdev 3749122579944117927: vdev_path changed from '/dev/sda3' to '/dev/gpt/WDR'
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading checkpoint txg
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading indirect vdev metadata
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Checking feature flags
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading special MOS directories
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading properties
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading AUX vdevs
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading vdev metadata
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': metaslab_init failed [error=97]
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': vdev_load: metaslab_init failed [error=97]
spa_misc.c:405:spa_load_failed(): spa_load(mroot, config trusted): FAILED: vdev_load failed [error=97]
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): UNLOADING
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): spa_load_retry: rewind, max txg: 283856480
spa_misc.c:419:spa_load_note(): spa_load(mroot, config trusted): LOADING
vdev.c:167:vdev_dbgmsg(): disk vdev '/dev/gpt/WDR': best uberblock found for spa mroot. txg 283850222
spa_misc.c:419:spa_load_note(): spa_load(mroot, config untrusted): using uberblock with txg=283850222
vdev.c:2508:vdev_update_path(): vdev_copy_path: vdev 3749122579944117927: vdev_path changed from '/dev/sda3' to '/dev/gpt/WDR'
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading checkpoint txg
spa_misc.c:2311:spa_import_progress_set_notes_impl(): 'mroot' Loading indirect vdev metadata
ZFS_DBGMSG(zdb) END

Command exit status: 6
Script done on Thu May 22 19:35:37 2025

This output from zdb appears to show what might be a usable txg, the second txg illustated above?

With this kernel panic happening during import, though, I'm not able to try to fix the pool on import.

I think that this is probably in the base OpenZFS? The corrupted pool is also causing a kernel panic on import when using a ZFS on Linux build for openSUSE LEAP 15.6 (zfs rpm version 2.3.2-lp156.1.1). (The Linux host is still usable past that point, though it causes mostly a failure for any further usage of ZFS on the Linux host)

In the linux kernel's log output, the panic shows up as follows, when trying to import the corrupted pool:

Code:

VERIFY0(dmu_object_info(os, mapping_object, &doi)) failed (0 == 52)
PANIC at vdev_indirect_mapping.c:348:vdev_indirect_mapping_open()
Showing stack for process 2603
CPU: 9 PID: 2603 Comm: zpool Tainted: P           OE      n 6.4.0-150600.21-default #1 SLE15-SP6 039015c2e8321fe64d2f21cd23bf8b046ab54be1
Hardware name: SYSTEM_MANUFACTURER HX90/HX90, BIOS 5.19 10/11/2021
Call Trace:
 <TASK>
 dump_stack_lvl+0x44/0x60
 spl_panic+0xc8/0x100 [spl 98d94886b30635a864cb10d337d43ed5f4e3a6d1]
 ? spl_kmem_cache_free+0x9d/0x1e0 [spl 98d94886b30635a864cb10d337d43ed5f4e3a6d1]
 ? srso_alias_return_thunk+0x5/0x7f
 ? srso_alias_return_thunk+0x5/0x7f
 ? aggsum_add+0x177/0x190 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 ? srso_alias_return_thunk+0x5/0x7f
 ? dnode_hold_impl+0x316/0xdc0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 ? __kmalloc_node+0x50/0x130
 vdev_indirect_mapping_open+0x147/0x180 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 ? srso_alias_return_thunk+0x5/0x7f
 ? spa_config_enter_impl.isra.9+0xbf/0x110 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 spa_remove_init+0xa2/0x1f0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 spa_load+0x137/0x14d0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 spa_load_best+0x18c/0x2d0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 spa_import+0x260/0x660 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 ? nvlist_xunpack+0x62/0xb0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 zfs_ioc_pool_import+0x13b/0x150 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 zfsdev_ioctl_common+0x4ad/0x980 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 ? srso_alias_return_thunk+0x5/0x7f
 ? __kmalloc_node+0xbe/0x130
 ? srso_alias_return_thunk+0x5/0x7f
 zfsdev_ioctl+0x4f/0xe0 [zfs 6357b40841f89d244bdaf046db6feff5a5a6a507]
 __x64_sys_ioctl+0x92/0xd0
 do_syscall_64+0x5b/0x80
 ? exit_to_user_mode_prepare+0x1ed/0x220
 ? srso_alias_return_thunk+0x5/0x7f
 ? exit_to_user_mode_prepare+0x1ed/0x220
 ? srso_alias_return_thunk+0x5/0x7f
 ? syscall_exit_to_user_mode+0x1e/0x40
 ? srso_alias_return_thunk+0x5/0x7f
 ? do_syscall_64+0x67/0x80
 ? srso_alias_return_thunk+0x5/0x7f
 ? exc_page_fault+0x69/0x150
 entry_SYSCALL_64_after_hwframe+0x77/0xe1
RIP: 0033:0x7f3fd3f29f9b
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007fff49569d70 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fff49569e30 RCX: 00007f3fd3f29f9b
RDX: 00007fff49569e30 RSI: 0000000000005a02 RDI: 0000000000000003
RBP: 00007fff4956dd20 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000246 R12: 000055edaf7dfa70
R13: 000055edaf7e24d0 R14: 00007f3fcc000fe0 R15: 0000000000000000
 </TASK>

Up to the point of this corruption in the zpool, I was using the FreeBSD installation on that zpool via VirtualBox on Linux. I'd created a VMDK file for the SSD that the zpool is installed to, having the appropriate udev permissions for the non-root user to access this disk as /dev/sda on Linux. I then created a Virtualbox VM for the VMDK representing the SSD where this pool is installed.

The vmdk was assigned to a vritual AHCI controller in virtualbox, type virtio-scsi. The VMDK file itself was not configured for write-though under VirtualBox, at the time. There may have been a hard powerrof at some point.

The zpool had recently been configured to use edonr as a checksum type. The pool was created with checksum=skein. I'm not sure if the usage of the new checksum kind from OpenZFS 2.2 could or could not be related to it becoming corrupted?

Using a similar VMDK setup, I've tried to fix the pool with OpenZFS in a recent omniOS build. At this time, the omniOS ZFS distribution doesn't appear to support one OpenZFS 2.2 feature that the corrupted zpool is using, mainly vdev_zaps.

With this causing a kernel panic throughout, will I have to call it a loss here?

neogeo · May 23, 2025

The zpool [strike]doesn't appear to be[/strike] ... in fact, is also causing a kernel panic with OpenZFS 2.2.7 under Alpine Linux. It doesn't show up in Alpine's default /var/log/messages but the kernel panic (under Linux user mode) is presented via Alpine's `dmesg` cmd.

This was with the additional Alpine packages: zfs zfs-lts zfs-openrc zfs-udev zfs-utils-py then after rebooting the Alpine Linux VM to the linux-lts kernel that installs with the same.

I'll take a look at it with OpenZFS in a FreeBSD 15-CURRENT snapshot shortly

cracauer@ · May 23, 2025

neogeo said:
[...]doesn't show up in Alpine's default /var/log/messages but the kernel panic (under Linux user mode) is presented via Alpine's `dmesg` cmd.

Let me guess, that is running systemd, isn't it?

neogeo · May 23, 2025

cracauer@ said:
Let me guess, that is running systemd, isn't it?

I think they use OpenRC in Alpine, with /etc/init.d and such. They have a wiki page, for more information though.

For device support with the base install, I guess there's Alpine's mdev. Alpine's setup-xorg-base installer script will install eudev though.

I haven't taken a look at the apk build files for the packages. I think that that the -udev packages might be for eudev.

neogeo · May 23, 2025

There's a similar kernel panic under the latest 15-CURRENT snapshot (amd64)

cracauer@ · May 23, 2025

neogeo said:
I think they use OpenRC in Alpine, with /etc/init.d and such. They have a wiki page, for more information though.

For device support with the base install, I guess there's Alpine's mdev. Alpine's setup-xorg-base installer script will install eudev though.

I haven't taken a look at the apk build files for the packages. I think that that the -udev packages might be for eudev.

Oh. Strange. When I hear something as outrageous as kernel panics not appearing in /var/log/messages I instantly think "what area has systemd spread to now?".

neogeo · May 23, 2025

If anyone has any advice about how to approach the debugging, I could try to provide more information if it could help.

At one point, I'd tried running zpoolimport -T... with the second txg illustrated above. It still reached the kernel panic though. I don't know of any further possibility for recovering the pool, at this point.

neogeo · May 23, 2025

cracauer@ said:
Oh. Strange. When I hear something as outrageous as kernel panics not appearing in /var/log/messages I instantly think "what area has systemd spread to now?".

lol. I think I need to read up more about rsyslog in alpine. kind of stuck on the data loss at this point but there must be a square two lol.

cracauer@ · May 23, 2025

neogeo said:
If anyone has any advice about how to approach the debugging, I could try to provide more information if it could help.

The #zfs IRC channel has some helpful specialists.

You will get a better backtrace if you do kgdb kernel debugging. If your host is FreeBSD and you use bhyve it is pretty straightforward to set up.

With either backtrace you can just go into the code and comment out the actual panic() call, so that the code continues. Obviously you shouldn't have any important filesystems mounted when doing that. There is a chance that this allows a readonly import to get the data off. Depends on what the panic is about.

cracauer@ · May 23, 2025

Something like this:

Code:

diff --git a/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c b/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c
index 1515ddc1baa2..af19a164254b 100644
--- a/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c
+++ b/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c
@@ -346,7 +346,7 @@ vdev_indirect_mapping_open(objset_t *os, uint64_t mapping_object)
 {
        vdev_indirect_mapping_t *vim = kmem_zalloc(sizeof (*vim), KM_SLEEP);
        dmu_object_info_t doi;
-       VERIFY0(dmu_object_info(os, mapping_object, &doi));
+       //VERIFY0(dmu_object_info(os, mapping_object, &doi));
 
        vim->vim_objset = os;
        vim->vim_object = mapping_object;

neogeo · May 23, 2025

cracauer@ said:

Something like this:

Code:

diff --git a/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c b/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c
index 1515ddc1baa2..af19a164254b 100644
--- a/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c
+++ b/sys/contrib/openzfs/module/zfs/vdev_indirect_mapping.c
@@ -346,7 +346,7 @@ vdev_indirect_mapping_open(objset_t *os, uint64_t mapping_object)
 {
        vdev_indirect_mapping_t *vim = kmem_zalloc(sizeof (*vim), KM_SLEEP);
        dmu_object_info_t doi;
-       VERIFY0(dmu_object_info(os, mapping_object, &doi));
+       //VERIFY0(dmu_object_info(os, mapping_object, &doi));
 
        vim->vim_objset = os;
        vim->vim_object = mapping_object;

Thanks! After some belated backups lol I'll try creating a recovery build with the new installation

cy@ · May 24, 2025

cracauer@ said:
Oh. Strange. When I hear something as outrageous as kernel panics not appearing in /var/log/messages I instantly think "what area has systemd spread to now?".

Though, kernel panics don't necessarily show up in /var/log/messages unless the kernel panic displayed in the kernel message circular buffer was still intact and could still be gleaned by the logging daemon (syslogd in our case).

I've never seen a Linux kernel panic in /var/log/messages.

Systemd captures its logs using systemd-journald. They want to replace rsyslogd entirely. As to where they are written to, I used to know where. It's in some arcane location in the filesystem. Some systemd unit files tell systemd where to find certain log files.

neogeo · May 24, 2025

I've patched the sources as cracauer@ illustrated, then rebuilt and installed the kernel.

When importing the corrupted pool, a panic occurs elsewhere then. I'll try to find the source of this and patch out for purpose of build

Code:

nic: VERIFY0(dmu_bonus_hold(os, vim->vim_object, vim, &vim->vim_dbuf)) failed (0 == 97)

cpuid = 4
time = 1748074111
KDB: stack backtrace:
#0 0xffffffff80b8b89d at kdb_backtrace+0x5d
#1 0xffffffff80b3dc01 at vpanic+0x131
#2 0xffffffff82e3fc1a at spl_panic+0x3a
#3 0xffffffff82f4e970 at vdev_indirect_mapping_open+0xc0
#4 0xffffffff82fc0924 at spa_remove_init+0x154
#5 0xffffffff82f1cb2c at spa_load+0x17c
#6 0xffffffff82f1c453 at spa_load_best+0x1d3
#7 0xffffffff82f1bd50 at spa_import+0x300
#8 0xffffffff82fe73b3 at zfs_ioc_pool_import+0xb3
#9 0xffffffff82fe07a8 at zfsdev_ioctl_common+0x578
#10 0xffffffff82e4522b at zfsdev_ioctl+0x11b
#11 0xffffffff809cc28b at devfs_ioctl+0xcb
#12 0xffffffff80c3ac1e at vn_ioctl+0xce
#13 0xffffffff809cc8ee at devfs_ioctl_f+0x1e
#14 0xffffffff80bac9b5 at kern_ioctl+0x255
#15 0xffffffff80bac6ff at sys_ioctl+0xff
#16 0xffffffff810262c5 at amd64_syscall+0x115
#17 0xffffffff80ffccab at fast_syscall_common+0xf8

neogeo · May 24, 2025

fwiw I've tried making a conditional definition of the VERIFY0 macro, in a patch on the kernel sources for importing this zpool with -o readonly=on.

It seems I'm not able to set CFLAGS or a customized OPENZFS_CFLAGS at the make command line, during buildkernel.

Using the releng/14.2 sources from FreeBSD's GitHub repository now, this is the patch I was trying to build with:

Code:

diff --git a/sys/conf/kmod.mk b/sys/conf/kmod.mk
index 9310f1572..6c214c18f 100644
--- a/sys/conf/kmod.mk
+++ b/sys/conf/kmod.mk
@@ -547,7 +547,7 @@ OBJS_DEPEND_GUESS+= opt_global.h
 .endif
 
 ZINCDIR=${SYSDIR}/contrib/openzfs/include
-OPENZFS_CFLAGS=     \
+OPENZFS_CFLAGS+=     \
     -D_SYS_VMEM_H_  \
     -D__KERNEL__ \
     -nostdinc \
diff --git a/sys/contrib/openzfs/lib/libspl/include/assert.h b/sys/contrib/openzfs/lib/libspl/include/assert.h
index 57f5719c1..f2cd3a085 100644
--- a/sys/contrib/openzfs/lib/libspl/include/assert.h
+++ b/sys/contrib/openzfs/lib/libspl/include/assert.h
@@ -114,6 +114,9 @@ do {                                    \
             (void *)__left, #OP, (void *)__right);        \
 } while (0)
 
+#ifdef NO_VERIFY0
+#define    VERIFY0(LEFT) true;
+#else
 #define    VERIFY0(LEFT)                            \
 do {                                    \
     const uint64_t __left = (uint64_t)(LEFT);            \
@@ -122,6 +125,7 @@ do {                                    \
             "%s == 0 (0x%llx == 0)", #LEFT,            \
             (u_longlong_t)__left);                \
 } while (0)
+#endif
 
 #define    VERIFY0P(LEFT)                            \
 do {                                    \

When running make as follows, I see build errors though. Maybe it interferes with how the build is resolving include dirs, when CFLAGS or the adapted OPENZFS_CFLAGS would be set from the buildkernel cmdline.

Code:

$ make -j12 buildkernel KERNCONF=GENERIC OPENZFS_CFLAGS=-DNO_VERIFY0
--- buildkernel ---
make[1]: "/usr/src/Makefile.inc1" line 337: SYSTEM_COMPILER: Determined that CC=/usr/local/bin/ccache cc matches the source tree.  Not bootstrapping a cross-compiler.
make[1]: "/usr/src/Makefile.inc1" line 342: SYSTEM_LINKER: Determined that LD=ld matches the source tree.  Not bootstrapping a cross-linker.
--- buildkernel ---
--------------------------------------------------------------
>>> Kernel build for GENERIC started on Sat May 24 02:52:14 PDT 2025
--------------------------------------------------------------
[...]
--- all_subdir_dtrace/dtaudit ---
/usr/src/sys/security/audit/audit_dtrace.c:42:10: fatal error: 'sys/dtrace.h' file not found
   42 | #include <sys/dtrace.h>
      |          ^~~~~~~~~~~~~~
--- all_subdir_bhnd ---
ld -m elf_x86_64_fbsd -warn-common --build-id=sha1 -T /usr/src/sys/conf/ldscript.kmod.amd64 -r  -o bhnd_pci_hostb.ko.full bhnd_pci_hostb.o bhnd_pcie2_hostb.o
--- all_subdir_dtrace ---
--- all_subdir_dtrace/dtrace_test ---
--- dtrace_test.o ---
/usr/local/bin/ccache cc -target x86_64-unknown-freebsd14.2 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin  -O2 -pipe -fno-common  -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -DKLD_TIED -nostdinc  -I/usr/src/sys -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/usr/src/amd64.amd64/sys/GENERIC/opt_global.h -I. -I/usr/src/sys -I/usr/src/sys/contrib/ck/include -fno-common -g -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -fdebug-prefix-map=./machine=/usr/src/sys/amd64/include -fdebug-prefix-map=./x86=/usr/src/sys/x86/include -fdebug-prefix-map=./i386=/usr/src/sys/i386/include -I/usr/obj/usr/src/amd64.amd64/sys/GENERIC     -MD  -MF.depend.dtrace_test.o -MTdtrace_test.o -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wcast-qual -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__ -Wmissing-include-dirs -fdiagnostics-show-option -Wno-unknown-pragmas -Wno-error=tautological-compare -Wno-error=empty-body -Wno-error=parentheses-equality -Wno-error=unused-function -Wno-error=pointer-sign -Wno-error=shift-negative-value -Wno-address-of-packed-member -Wno-format-zero-length   -mno-aes -mno-avx  -std=gnu99 -include /usr/src/sys/cddl/compat/opensolaris/sys/debug_compat.h -c /usr/src/sys/cddl/dev/dtrace/dtrace_test.c -o dtrace_test.o
--- all_subdir_dtrace/dtaudit ---
1 error generated.
--- all_subdir_bhnd ---
ctfmerge -L VERSION -g -o bhnd_pci_hostb.ko.full bhnd_pci_hostb.o bhnd_pcie2_hostb.o
--- all_subdir_dtrace ---
*** [audit_dtrace.o] Error code 1

make[5]: stopped in /usr/src/sys/modules/dtrace/dtaudit
       12.64 real         9.76 user         2.88 sys

make[1]: stopped in /usr/src

make: stopped in /usr/src

Command exit status: 2
Script done on Sat May 24 02:52:29 2025

I was seeing a similar error when trying to set CFLAGS=-DNO_VERIFY0 for the entire kernel build, then without adapting OPENZFS_CFLAGS like so. The error on a missing include was showing from elsewhere then.

Maybe it's possible to define a custom WITHOUT_VERIFY0 flag that could be used to activate this patched definition via /etc/src.conf? Using CFLAGS for this does not seem to work out, when these cflags are set in or via the kernel cmdline in the entire kernel build.

I'm expecting more failed calls under VERIFY0 ahead with the failed zpool. I'll try to work out this hack, though, to entirely disable the assert-like call that the code is making under the VERIFY0 macro.

Update: Building with WITHOUT_ASSERT_DEBUG=Defined in /etc/src.conf of course it did not affect this feature of the openzfs sources, lol. The call under the VERIFY0 macro still happens as usual then.

Update: Setting -DNO_VERIFY0 via OPENZFS_CFLAGS then rebuilding the kernel may not be sufficient. Maybe it needs a full buildworld, with the modified OPENZFS_CFLAGS

Maybe it's possible to build and install just the openzfs libs, if producing an incremental build? Looking at
/usr/src/targets/pseudo/userland/{cddl,lib,misc}/Makefile.depend, after Building FreeBSD in meta mode, and the section, "bootstrapping meta mode" there.

The patch, at present:

Code:

diff --git a/sys/conf/kmod.mk b/sys/conf/kmod.mk
index 9310f1572..8fb985635 100644
--- a/sys/conf/kmod.mk
+++ b/sys/conf/kmod.mk
@@ -560,6 +560,10 @@ OPENZFS_CFLAGS=     \
     -I${SYSDIR}/cddl/contrib/opensolaris/uts/common \
     -include ${ZINCDIR}/os/freebsd/spl/sys/ccompile.h
 
+.if defined(WITHOUT_ZFS_VERIFY0)
+CFLAGS+=        -DNO_VERIFY0=1
+.endif
+
 .include <bsd.dep.mk>
 .include <bsd.clang-analyze.mk>
 .include <bsd.obj.mk>
diff --git a/sys/contrib/openzfs/lib/libspl/include/assert.h b/sys/contrib/openzfs/lib/libspl/include/assert.h
index 57f5719c1..f2cd3a085 100644
--- a/sys/contrib/openzfs/lib/libspl/include/assert.h
+++ b/sys/contrib/openzfs/lib/libspl/include/assert.h
@@ -114,6 +114,9 @@ do {                                    \
             (void *)__left, #OP, (void *)__right);        \
 } while (0)
 
+#ifdef NO_VERIFY0
+#define    VERIFY0(LEFT) true;
+#else
 #define    VERIFY0(LEFT)                            \
 do {                                    \
     const uint64_t __left = (uint64_t)(LEFT);            \
@@ -122,6 +125,7 @@ do {                                    \
             "%s == 0 (0x%llx == 0)", #LEFT,            \
             (u_longlong_t)__left);                \
 } while (0)
+#endif
 
 #define    VERIFY0P(LEFT)                            \
 do {                                    \
diff --git a/sys/modules/opensolaris/Makefile b/sys/modules/opensolaris/Makefile
index 42702d0c6..9b2da517c 100644
--- a/sys/modules/opensolaris/Makefile
+++ b/sys/modules/opensolaris/Makefile
@@ -25,6 +25,9 @@ SRCS+=        opensolaris_atomic.c
 .endif
 
 CFLAGS+=     ${OPENZFS_CFLAGS}
+.if defined(WITHOUT_VERIFY0)
+CFLAGS+=    -DNO_VERIFY0
+.endif
 
 EXPORT_SYMS=    YES

Building with the following in /etc/src.conf

Code:

WITHOUT_ZFS_VERIFY0=    Defined
WITH_CCACHE_BUILD=      Defined
WITHOUT_LLVM_TARGET_ALL=    Defined
WITH_LLVM_TARGET_X86=        Defined
WITHOUT_LLVM_TARGET_AARCH64=    Defined
WITHOUT_LLVM_TARGET_ARM=    Defined
WITHOUT_LLVM_TARGET_MIPS=    Defined
WITHOUT_LLVM_TARGET_POWERPC=    Defined
WITHOUT_LLVM_TARGET_RISCV=    Defined
WITHOUT_LLVM_TARGET_SPARC=    Defined

cy@ · May 25, 2025

neogeo said:

I've patched the sources as cracauer@ illustrated, then rebuilt and installed the kernel.

When importing the corrupted pool, a panic occurs elsewhere then. I'll try to find the source of this and patch out for purpose of build

Code:

nic: VERIFY0(dmu_bonus_hold(os, vim->vim_object, vim, &vim->vim_dbuf)) failed (0 == 97)

cpuid = 4
time = 1748074111
KDB: stack backtrace:
#0 0xffffffff80b8b89d at kdb_backtrace+0x5d
#1 0xffffffff80b3dc01 at vpanic+0x131
#2 0xffffffff82e3fc1a at spl_panic+0x3a
#3 0xffffffff82f4e970 at vdev_indirect_mapping_open+0xc0
#4 0xffffffff82fc0924 at spa_remove_init+0x154
#5 0xffffffff82f1cb2c at spa_load+0x17c
#6 0xffffffff82f1c453 at spa_load_best+0x1d3
#7 0xffffffff82f1bd50 at spa_import+0x300
#8 0xffffffff82fe73b3 at zfs_ioc_pool_import+0xb3
#9 0xffffffff82fe07a8 at zfsdev_ioctl_common+0x578
#10 0xffffffff82e4522b at zfsdev_ioctl+0x11b
#11 0xffffffff809cc28b at devfs_ioctl+0xcb
#12 0xffffffff80c3ac1e at vn_ioctl+0xce
#13 0xffffffff809cc8ee at devfs_ioctl_f+0x1e
#14 0xffffffff80bac9b5 at kern_ioctl+0x255
#15 0xffffffff80bac6ff at sys_ioctl+0xff
#16 0xffffffff810262c5 at amd64_syscall+0x115
#17 0xffffffff80ffccab at fast_syscall_common+0xf8

vdev_indirect_mapping_open() is only compiled into the kernel if ZFS_DEBUG is true. This panic could be due to one of the ASSERT or EQUIV macro calls finding something amiss. It's suggested you open a PR at bugs.freebsd.org. Or post the panic on freebsd-current@.

It should list the assertion that caused it to panic though.

Comment out line 8 in sys/modules/zfs/static_ccompile.h and rebuild your kernel without any of the other changes you have discussed in this thread. Import the pool and scrub it.

While you're at it send a recursive snapshot to backup and try recreating the pool.

When trying something a little sketchy, like importing a Linux zpool into FreeBSD -- yeah, I know this should work but I doubt this has been exercised at all by upstream -- checkpoint the zpool. You can recover the pool back to the original Linux machine with --rewind-to-checkpoint. The big unknown here is what version of ZFS was running in the Linux kernel and what did they alter to make it work there. It's likely the pool has been written by Linux in an inconsistent manner such that FreeBSD, because it trusts what it sees in various control blocks and data structures. It's always best to give yourself a path out of any potential mess when trying something the OpenZFS folks probably hadn't tested themselves.

neogeo · May 25, 2025

cy@ said:
Comment out line 8 in sys/modules/zfs/static_ccompile.h and rebuild your kernel without any of the other changes you have discussed in this thread. Import the pool and scrub it.

I'll take a look at this, thanks!

For what it's worth, when proceeding with the initial approach, above - commenting out the VERIFY0 calls - then after commenting out the second VERIFY0 call in the same function as the first, rebuilding world and the kernel and reinstalling, a page fault showed up in the kernel when trying to import the pool under the patched zfs stack. I suppose that the VERIFY0 calls may be serving some kind of a useful purpose, such as to prevent the page fault in such a call stack as in which it occurred there?

I'm not certain how relevant the details could be by for this hacked ZFS stack, but I do have the crash dump though.

cy@ said:
When trying something a little sketchy, like importing a Linux zpool into FreeBSD -- yeah, I know this should work but I doubt this has been exercised at all by upstream -- checkpoint the zpool.

Great advice! I'll roll back the present patch and apply the the patch you've recommended. Using ccache, it should go reasonably quickly lol.

On the plus side, this SATA disk has a working FreeBSD installation again. It's now installed to a UFS2 filesystem on what was a spare partition - previously used for a ZFS SLOG for the FreeBSD zpool, was trying that out for a time lol - on the physical SATA disk.

Rebuilding the ZFS world shortly, on the machine. Thanks!

neogeo · May 25, 2025

I've been able to produce a shell script that uses zdb to create a bunch of files suitable for use with zfs receive. With some further scripting to automate the zcat/zfs receive parts, I think that the pool is recoverable now.

The approximate shell script:

Bash:

#!/bin/sh

## The -t txg value was produced with manual analysis
## from the output of the following shell command
##
## # zdb -AAAXe <POOL>
##
## Beginning with the usable txg ID displayed
## in  that shell command ... here, the value
## 283842383 ... it's possible to produce a
## series of files compatible with 'zfs receive'
## under a given dump directory
##
## to receive the files to a usable pool, e.g
##
## # zcat dump/each/4671.gz | zfs receive -sev tank/opt/bak
## output:
## receiving full stream of mroot/usr/obj/xmin_FreeBSD-13.3-RELEASE_amd64_1303001@blankfs into tank/opt/bak/xmin_FreeBSD-13.3-RELEASE_amd64_1303001@blankfs
##
## this requires a TXG id as output from: zdb -AAAXe <POOL>

POOL=${POOL:-mroot}
TXG=${TXG:-283842383}
DUMPDIR=${DUMPDIR:-dump/each}

mkdir -p ${DUMPDIR}

for ID in $(zdb -d ${POOL} -AAAXe -t ${TXG}  |
        awk -v "FS=[ ,]" '/^Dataset/ { print $6 }'); do
    echo "#-- ${ID}" 1>&2;
    zdb -eAAA -t ${TXG} -B ${POOL}/${ID} |
        gzip > ${DUMPDIR}/${ID}.gz  ;
done

It takes a little while, sure, and the storage device for the DUMPDIR certainly needs to have sufficient capacity for the gzipped zfs-receive-compatible files.

I think it's solved at least for purpose of filesystem recovery.

Given the particular difficulties in reproducing the series of issues that may have lead to this zpool becoming corrupted, I'm shy of creating a bug report for it.

I'm not certain if it's not related, but I think I won't be using the virtio-scsi storage controller under VirtualBox on Linux, any further. During the recovery process, there had been some peculiarities with a UFS2 filesystem under at least the same. The underlying partition on the SATA disk was not accepting changes from fsck_ufs until after the virtual machine with this virtio-scsi controller was fully stopped (if even then). The virtio-scsi controller was also in use when the pool became corrupted in its storage on the physical SATA disk. Of course, it may have been related to any number of other items in the whole data-flow for the zpool, so to speak.

Given that the data itself is on a physical SATA disk and there's a new FreeBSD installation on the same now, I've rebooted to that installation for further data recovery stuff.

The dumpdir for the recovery script is located over a USB external. It seems to be working out, at this point, will see how it goes for the zcat & zfs-receive parts.

I should certainly checkpoint the receiving pool before any further stage of the data recovery.

Update: Assuming that the send-like stream data it's writing out to the dump-dir may not represent incremental snapshots, then the dump dir may typically need more space than in use for the original pool, maybe a lot more space. To some extent, this might be mitigated by using compression for storing the individual dump stream files to the receiving filesystem.

I'm probably going to run out of space before this recovery process has completed. It's going very slowly though, in terms of GiB/s, so there's time - will try some more scripting, like something for selective recovery and a persistent stat-like table to match the data ID in each stream file name.

In that shell script, the output from the following shell command can be used to determine what filesystem, or volume, or snapshot, or other object is represented with each object set ID:

Code:

zdb -d ${POOL} -AAAXe -t ${TXG}

An approximate grammar for the "Dataset" lines of interest, in the output from that zdb command:

Code:

FS := ", "
DS_LINE := "Dataset " NAME " [" KIND "]" FS "ID " OBJSET_ID FS "cr_txg " CR_TXG_ID FS HSIZE FS COUNT " objects"

where:

NAME: e.g dataset name, or snapshot name, or such, beginning with the original pool name
KIND: e.g ZPL
ID, CR_TXG_ID: numeric
HSIZE: Human-readable size of the object
COUNT: A counting number lol, or zero

An abbreviated way to determine the TXG value for use with the script above:

Code:

TXG=$(zdb -AAAXe ${POOL} | awk '/best uberblock/ { print $12 }' | tail -n1)

zdb()(8) provides additional args, of course, depending on how it's called. For instance, the -x arg can be used to write some additional data (a lot of data) to some destination. In my own setup, this produced a file seemingly truncated to the same size as the original zpool, 1.8 TB in this case, but with storage space used in the range of a few GB, approx 2.7 GB in this case, and an empty zpool.cache file. With this data file being extremely sparse lol, it probably compresses pretty well though? (If the pool had not been corrupted and causing segfaults with zdb at some points, maybe the used space in the data file would have been more)

Update: I'm not quite sure how to weave together any snapshots from the dump files produced with the recovery script, above. These don't seem to represent incremental snapshots. I don't know how to use zfs receive to create a snapshot that would follow after an existing snapshot for the same filesystem, if the send stream was not produced for an incremental snapshot. I'm not sure if it's even possible, this, from user-space scripting at least.

Maybe the dump script should be updated to skip any intermediate snapshots, basically skipping anything with "@" in the object set name, then just using the latest filesystem or volume dump as would be produced with the dump script. The dump script would need to parse on the name of each object set, at least in the awk part.

neogeo · May 26, 2025

The updated recovery/dump script:

Code:

#!/bin/sh
# zpool_recovery.sh
#
# usage: zpool_recovery.sh [pool [dumpdir [txg_id]]]
#
# default pool: tank
# default dumpdir: $PWD/dump/each
# default txg_id: Will be determined with zdb

## to echo script commands:
set -xe
## or to not echo:
# set -e

POOL=${1:-tank}
if ! shift; then true; fi

DUMPDIR=${1:-dump/each}
if ! shift; then true; fi

TXG=${1:-$(zdb -AAAXe ${POOL} | awk '/best uberblock/ { print $12 }' | tail -n1)}
if ! shift; then true; fi

if [ "x${TXG}" = "x" ]; then
  echo "no txg found" 1>&2
  exit 1
fi

if ! XZ=$(which pixz 2>/dev/null); then
    XZ=$(which xz)
fi

mkdir -p ${DUMPDIR}

zdb -d ${POOL} -AAAXe -t ${TXG}  |
                awk -v "FS=[ ,]" '/^Dataset/ && $2 !~ "@" { print $2 " " $6 }' |
  while read NAME ID; do
        echo "#-- ${ID}: ${NAME}" 1>&2;
        # creating the %.name file in all instances, as an update to version 1 ...
        echo "${NAME}" > ${DUMPDIR}/${ID}.name
        if [ -e "${DUMPDIR}/${ID}.xz" ]; then
                echo "#%  ${ID}: already recovered. skipping" 1>&2
                # try to make it interruptable ...
                sleep 1
                continue
        fi
        zdb -eAAA -t ${TXG} -B ${POOL}/${ID} |
                ${XZ} > ${DUMPDIR}/${ID}.xz  ;
        ## try to make it interruptable (no signal handling here)
        sleep 1
done

It seems that the datasets can be recovered individually such as with:

Code:

xzcat dumpdir/<id>.xz | zfs receive -dv <pool>

For batch scripting, the zfs receive calls should generally proceed in reverse order as to which zdb had printed the object set information. This should serve to ensure that any parent datasets are added to the dest pool, first.

Caveats, etc:

This assumes that the origin pool is not being updated across subsequent runs to zdb.

This version of the recovery script will not recover any snapshots from the origin pool.

Not all datasets are guaranteed to be recoverable. e.g

Code:

#-- 303: mroot/usr/home/myuser
dump_backup: dmu_send_obj: Input/output error

When such an error message occurs, the dump script may have produced a partial dump under dumpdir/<id>.xz. An illustration of how zfs receive might handle that:

Code:

root@bld10:/mnt/opt/bak # zcat 303.gz | zfs receive -dv opal.hd/opt/bak
receiving full stream of mroot/usr/home/myuser@--head-- into opal.hd/opt/bak/usr/home/myuser@--head--
cannot receive new filesystem stream: incomplete stream

Though some data was received, it will not result in a usable filesystem for this dataset. (The script could be updated to remove these partial dumps, on non-zero exit from zdb.)

Version 3:

Code:

#!/bin/sh
# zpool_recovery.sh
#
# usage: zpool_recovery.sh [pool [dumpdir [txg_id]]]
#
# default pool: tank
# default dumpdir: $PWD/dump/each
# default txg_id: Will be determined with zdb

## to echo script commands:
set -xe
## or to not echo:
# set -e

POOL=${1:-tank}
if ! shift; then true; fi

DUMPDIR=${1:-dump/each}
if ! shift; then true; fi

TXG=${1:-$(zdb -AAAXe ${POOL} | awk '/best uberblock/ { print $12 }' | tail -n1)}
if ! shift; then true; fi

if [ "x${TXG}" = "x" ]; then
  echo "no txg found" 1>&2
  exit 1
fi

if ! XZ=$(which pixz 2>/dev/null); then
    XZ=$(which xz)
fi

mkdir -p ${DUMPDIR}

zdb -d ${POOL} -AAAXe -t ${TXG}  |
                awk -v "FS=[ ,]" '/^Dataset/ && $2 !~ "@" { print $2 " " $6 }' |
  while read NAME ID; do
        echo "#-- ${ID}: ${NAME}" 1>&2;
        # creating the %.name file in all instances, as an update to version 1 ...
        echo "${NAME}" > ${DUMPDIR}/${ID}.name
        if [ -e ${DUMPDIR}/${ID}.xz ]; then
                echo "#%  ${ID}: already recovered. skipping" 1>&2
                # try to make it interruptable ...
                sleep 1
                continue
        fi
        if ! zdb -eAAA -t ${TXG} -B ${POOL}/${ID} |
             ${XZ} > ${DUMPDIR}/${ID}.xz ; then
                echo "#% error during recovery (${ID}) ${NAME}. Removing partial output" 1>&2
                rm -v ${DUMPDIR}/${ID}.xz
        fi
        ## try to make it interruptable (no signal handling here)
        sleep 1
done

Update: I've worked out an updated version of the recovery script, will try to put some docs together and publish to GitHub, with a link here.

fwiw the 'best uberblock' parse for zdb -AAAXe ${POOL}] may not be sufficient for determining a usable txg ID for some pools. There are ways to work around this.

With some added snapshot support in the updated script, I was able to restore every filesystem except my homedir-not-on-sub-filesystems files. I'd not made any snapshots for my homedir in that installation, and the homedir dataset itself became corrupted along with the tmpdir filesystem and the ROOT/<active_be> filesystem.

Maybe it's better than a complete loss lol. I'll try to share the link to the github repository once the updated script is published.

neogeo · May 30, 2025

The zpool recovery script: zpool_recovery.sh (gist)