ZFS kernel panic loop after zpool remove

achanler · May 29, 2020

Is there anyway to put zfs into 'safe-mode' where it doesn't attempt to modify any state so you can read your data off of it? I am stuck in a kernel panic reboot loop. Going into single use mode, the first zfs command I run starts up zfs and it panics again.

The long story:

I had two 2TB drives in mirror configuration for past 7 years (upgrading zfs and freebsd as time went by). It was finally time to upgrade due to running out of space. I added two 4TB drives as a second vdev mirror:

zpool add storage mirror /dev/ada3 /dev/ada4

zpool status looked like this:

Code:

  pool: storage
state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May  9 01:19:54 2020
config:

        NAME                             STATE     READ WRITE CKSUM
        storage                          ONLINE       0     0     0
          mirror-0                       ONLINE       0     0     0
            ada1                         ONLINE       0     0     0  block size: 512B configured, 4096B native
            ada2                         ONLINE       0     0     0  block size: 512B configured, 4096B native
          mirror-1                       ONLINE       0     0     0
            ada3                         ONLINE       0     0     0
            ada4                         ONLINE       0     0     0

errors: No known data errors

I should have stopped there, but I thought I would 'fix' the block size warning. zdb showed the mirror-0 as ashift 9 and mirror-1 as ashift 12. Next I issued zpool remove storage mirror-0. I thought this would migrate the data off the misconfigured mirror-0 to mirror-1 and I could recreate mirror-0. However the system took a kernel panic immediately. Now I can't find a way to safely mount the filesystem to get the data off. The panic is in some code that is verifying the ashift makes sense.

Now I just want to get it into a read-only 'safe-mode' where I can copy the data off without any modifications to the current zpool state.
I tried powering off the new drives, and then system does not panic, but I cannot access the data on the old drives. Is there any sysctl setting that will help me out here? I also tried zfs remove -s storage as first command in single-user mode. I don't know if that did anything or not because it panicked again.

This system is running 'FreeBSD 12.1-RELEASE-p5 GENERIC amd64'. Here is part of the /var/crash/core.txt.X:

Code:

panic: solaris assert: ((offset) & ((1ULL << vd->vdev_ashift) - 1)) == 0 (0x400 == 0x0), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line: 3593
cpuid = 1
time = 1590769420
KDB: stack backtrace:
#0 0xffffffff80c1d307 at kdb_backtrace+0x67
#1 0xffffffff80bd063d at vpanic+0x19d
#2 0xffffffff80bd0493 at panic+0x43
#3 0xffffffff82a6922c at assfail3+0x2c
#4 0xffffffff828a3b83 at metaslab_free_concrete+0x103
#5 0xffffffff828a4dd8 at metaslab_free+0x128
#6 0xffffffff8290217c at zio_dva_free+0x1c
#7 0xffffffff828feb7c at zio_execute+0xac
#8 0xffffffff80c2fae4 at taskqueue_run_locked+0x154
#9 0xffffffff80c30e18 at taskqueue_thread_loop+0x98
#10 0xffffffff80b90c53 at fork_exit+0x83
#11 0xffffffff81082c2e at fork_trampoline+0xe
Uptime: 7s
Dumping 433 out of 7980 MB:..4%..12%..23%..34%..41%..52%..63%..71%..82%..93%

__curthread () at /usr/src/sys/amd64/include/pcpu.h:234
234             __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD));
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu.h:234
#1  doadump (textdump=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:371
#2  0xffffffff80bd0238 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#3  0xffffffff80bd0699 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:877
#4  0xffffffff80bd0493 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:804
#5  0xffffffff82a6922c in assfail3 (a=<unavailable>, lv=<unavailable>,
    op=<unavailable>, rv=<unavailable>, f=<unavailable>, l=<optimized out>)
    at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
#6  0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000,
    offset=137438954496, asize=<optimized out>, checkpoint=0)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593
#7  0xffffffff828a4dd8 in metaslab_free_dva (spa=<optimized out>,
    checkpoint=0, dva=<optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3863
#8  metaslab_free (spa=<optimized out>, bp=0xfffff800043788a0, txg=41924766,
    now=<optimized out>)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145
#9  0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070
#10 0xffffffff828feb7c in zio_execute (zio=0xfffff80004378830)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
#11 0xffffffff80c2fae4 in taskqueue_run_locked (queue=0xfffff80004222800)
    at /usr/src/sys/kern/subr_taskqueue.c:467
#12 0xffffffff80c30e18 in taskqueue_thread_loop (arg=<optimized out>)
    at /usr/src/sys/kern/subr_taskqueue.c:773
#13 0xffffffff80b90c53 in fork_exit (
    callout=0xffffffff80c30d80 <taskqueue_thread_loop>,
    arg=0xfffff800041d90b0, frame=0xfffffe004dcf9bc0)
    at /usr/src/sys/kern/kern_fork.c:1065
#14 <signal handler called>

Poking around in kgdb I could see that vdev_ashift is 12 for the operation that causes the panic. The offset was 1k aligned, but ashift 12 requires 4k alignment. I just need it to stop attempting to 'free' anything while I read the data out of it.

sebhtml · Jun 13, 2020

If it panicked immediately, your data is like still on da1 and da2.
The new da3 and da4 probably still contain nothing.

If your data is important, first, make an image of both disks da1 and da2. Something like:

Code:

cp /dev/da1 /tank/Backups/VDEVs/2020-06-13/da1.img
cp /dev/da2 /tank/Backups/VDEVs/2020-06-13/da2.img

In theory, both da1 and da2 contain a header with the information that ZFS needs to make the ZPOOL when you boot.

The command "zpool import" scans all disks and looks for disks that have the ZFS header.

Can you do

Code:

zpool status storage

achanler · Jun 18, 2020

Success! I have been able to recover the data!

First step was to backup one of the original drives. I got an external usb3 drive (but it only ran at usb2

). I used gpart add -t freebsd-zfs -s <size in blocks> <device> to make an identical partition that matches the size of /dev/ada1 in blocks. Then used dd to make the copy.

Next while studying the [PMAN=]zdb[/PMAN] man page I learned about the /boot/zfs/zpool.cache file. I also had an extra one with .tmp suffix that was slighter newer. I moved both these files out of there so the pools would not auto-import and panic again. Before I learned this I had also commented out #zfs_enable="YES" in /etc/rc.conf, but first zpool, zdb, or zfs command would load zfs kernel module and panic again.

Next I saved copies of zdb -l -u /dev/adaX for all 4 drives in the broken zpool. Studying that output I found the current transaction group is txg: 41924764 and the second vdev mirror was created at create_txg: 41924545. Looking at the uberblocks for one of the original drives I found some older than the creation txg of the new vdev mirror:

Code:

Uberblock[52]
        magic = 0000000000bab10c
        version = 5000
        txg = 41924532
        guid_sum = 12324274639653163715
        timestamp = 1590769436 UTC = Fri May 29 12:23:56 2020
        checkpoint_txg = 0

Next I shutdown, powered off all four drives, then reboot with only the usb drive that had the disk image of ada1. I could see the disk image:

Code:

>zpool import
   pool: storage
     id: 8526493143718440146
  state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-3C
config:

        storage                                     FAULTED  corrupted data
          mirror-0                                  DEGRADED
            diskid/DISK-57583531443839304C415354p1  ONLINE
            13614060752569685886                    UNAVAIL  cannot open

But attempts to recover didn't seem to work:

Code:

>zpool import -o readonly=on -f -F -X storage
        Destroy and re-create the pool from
        a backup source.

I learned about an undocumented -T option from a youtube video working with ZFS on Linux recovery. Then I went to the source code of zpool to look at it. It allows you to specify which txg to import from. I tried the txg from the newest uberblock I found before the creation of the second vdev mirror:

Code:

zpool import -o readonly=on -f -F -X -T 41924532  storage

But the command hung for a few hours. I could see a lot of disk read activity with iostat -x -w 10, but didn't know what it was doing. so I killed it. The same youtube video mentioned disabling a metadata validation check. I checked sysctl vfs.zfs and found the same one. And then the import work!

Code:

>sysctl vfs.zfs.spa_load_verify_metadata=0
vfs.zfs.spa_load_verify_metadata: 1 -> 0

>zpool import -o readonly=on -f -F -X -T 41924532  storage
>zpool status
  pool: storage
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May  9 01:19:54 2020
config:

        NAME                                        STATE     READ WRITE CKSUM
        storage                                     DEGRADED     0     0     0
          mirror-0                                  DEGRADED     0     0     0
            diskid/DISK-57583531443839304C415354p1  ONLINE       0     0     0  block size: 512B configured, 4096B native
            13614060752569685886                    UNAVAIL      0     0     0  was /dev/diskid/DISK-WD-WCC4M1060686

errors: No known data errors

>df -h /mnt/storage
Filesystem     Size    Used   Avail Capacity  Mounted on
storage        1.8T    1.7T     26G    99%    /mnt/storage

Next I used rsync -av /mnt/storage/ <destination> to copy the files out.

Lastly I cleaned up the usb drive so it could be its own independent zpool with different GUID, name, and mount point in case I need it again. Its not in the notes below but I did need to export and import again without the readonly option before doing fixing up the usb drive:

Code:

>zpool detach storage 13614060752569685886

>zpool status
  pool: storage
state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
action: Replace affected devices with devices that support the
        configured block size, or migrate data to a properly configured
        pool.
  scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May  9 01:19:54 2020
config:

        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          da0p1     ONLINE       0     0     0  block size: 512B configured, 4096B native

errors: No known data errors

>zpool reguid storage

>zpool export storage

>zpool import
   pool: storage
     id: 1207276611928267778
  state: ONLINE
status: One or more devices were configured to use a non-native block size.
        Expect reduced performance.
action: The pool can be imported using its name or numeric identifier.
config:

        storage     ONLINE
          da0p1     ONLINE
>zpool import storage storage-recovery

>zfs set mountpoint=/mnt/storage-recovery storage-recovery

>zfs list -t all
NAME                          USED  AVAIL  REFER  MOUNTPOINT
storage-recovery             1.73T  26.0G  1.73T  /mnt/storage-recovery
storage-recovery@2020-05-29    37K      -  1.73T  -

>zpool export storage-recovery

Then I made a fresh zpool with my 2 new drives, copied the files in, then made a second zpool with my old drives. I hope these notes are useful for anyone else who gets stuck in a kernel panic loop!

ZFS kernel panic loop after zpool remove

achanler

sebhtml

achanler