ZFS System will not boot (ZFS errors)

How can I fix this?

Background: One drive had smart errors. Was replaced and raid rebuilt with "zpool replace". Pool seems happy now but still will not boot.

Sorry for the pictures. I can't copy/paste from jViewer which is all I've got to access it at the moment.

1721891206843.png


1721891276706.png


1721891309156.png


I tried to reinstall bootcode with

Code:
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada0
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada1
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada2
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada3

(pmbr and gptzfsboot from the zroot/ROOT/default to ensure they match the OS version since the rescue disk I'm suck with is FreeBSD 11.2 and production system is 13.2)

But it will still not boot.

1721892056120.png

1721892087474.png
 
Pool seems happy now but still will not boot.
Did you do something else you didn't mention here? How did the issue occurred and what all was done to fix this?
Strange that zdb output has Linux paths in. Did you boot it to Linux?

In that boot prompt you can do ls / to see what is where though judging from the MOS errors there's more things at play there.
 
I had to use a Linux rescue disk to do the zpool replace for reasons. The first screenshot I posted was the same before that so it's not the root cause, but may be part of why it's still not working. I may need to redo that now that I have access to a BSD boot, but I'm not sure how.
1721898201974.png


Initial cause of this is unknown. Server had crashed and rebooted and was like this. I had assumed it to be the bad disk to be the cause, but the server had not been rebooted in a long time.

Even if I messed up the rebuild partitions, I should still be able to somehow force it to boot from the working part of the mirror from the "OK" prompt, shouldn't I?

In that boot prompt you can do ls / to see what is where though judging from the MOS errors there's more things at play there.
1721898650652.png
 
the "unsupported compression algorithm 67" sounds vaguely familiar and I think I had that after a zfs update. I try to find more detail and update this post.

Update: sorry, can't find it.
 
Even if I messed up the rebuild partitions, I should still be able to somehow force it to boot from the working part of the mirror from the "OK" prompt, shouldn't I?
You did mess up the partition, ada1 doesn't have proper partitions as other disks in pool have. My 2c: don't update bootlaoder on all disks, only disk you have replaced.

Inspite of bad partition layout (and missing legacy freebsd-boot parition) you did boot far enough to get to the boot prompt. That means BIOS was not loading bootloader from ada1 for sure.
In your setup mirror-0 and mirror-1 are expanding the pool (concating), you do need to have both vdevs healthy. In theory though this should work even with messed up partition because you are not booting from it. My gut feeling says you messed up the pool when you tried to update the bootloader with gpart and you hit this partition.

But as this disk is part of the mirror I wonder if the change was not propagated (mirrored) to other disk too.

I agree with VladiBG, the best thing to try now is to remove ada1 (pay attention if after the boot disk numbers are the same). You want to remove the disk that has only 2 partitions (1 and 9, that one being 8MB).
 
I destroyed partition table from ada1 and recreated by copying from another disk (gpart export/import)

ada1 is no long part of zpool (and I can't do a replace because I can only mount zpool readonly as shown in a previous screenshot)

System still will not boot, showing the same errors in my fist screenshot.

Currently no bootcode on ada1 (unless gpart export copies that) but I don't think it's using that anyway or I wouldn't get even that far.

In your setup mirror-0 and mirror-1 are expanding the pool (concating), you do need to have both vdevs healthy. In theory though this should work even with messed up partition because you are not booting from it. My gut feeling says you messed up the pool when you tried to update the bootloader with gpart and you hit this partition.
I am able to mount zpool readonly (due to feature flags) from rescue disk with no problems so the pool itself and it's data seem fine.
 
  • Like
Reactions: mro
I suspect he mess up the pool when he resilver it under the different zfs version from Linux. Now his gptzfsboot and zfs.ko are old and can't import the pool. The best option is to import the pool from LiveCD 14.1-RELEASE with altroot and make a full backup of it's data to external disk (UFS formated). Then to reinstall and upgrade the entire system to 14.1 and restore the information.
 
Can you also browse the pool? Can you browse /boot/ files?

Yes, the pool seems fine

1721972695550.png


I suspect he mess up the pool when he resilver it under the different zfs version from Linux. Now his gptzfsboot and zfs.ko are old and can't import the pool. The best option is to import the pool from LiveCD 14.1-RELEASE with altroot and make a full backup of it's data to external disk (UFS formated). Then to reinstall and upgrade the entire system to 14.1 and restore the information.
The very first screenshot in this thread (OK prompt) was the same before the resilvering so I don't think Linux is the cause.
 
Sometimes it's faster to make a full backup and rebuild everything instead of searching and repair. If you don't have a full backup of your current data now is the time to do it, just to be on the safe side. It's not so expensive to take a 4TB disk and backup everything on it.
 
I recreated the same setup in my VM, I hit all 4 partitions with gpart bootcode. I was able to boot it anyway. I'm not ZFS expert, I don't know if that's just because of the pure luck how my ZFS utilization is or if overwritting start of the ZFS partition would not hit any metadata (MOS).

What worries me is the first picture you sent: clearly pool name is messeed up with jibbrish and yet in recovery you are able to import and browse it. For a 'zicher' check you could have checked /boot/lua/loader.lua, it's an ascii file.

The import you are showing us, what OS are you using to do so? Trying to boot this off an 14.1 is also good step to try.
 
What worries me is the first picture you sent: clearly pool name is messeed up with jibbrish and yet in recovery you are able to import and browse it.
Might that not be the result of an old(er) boot code (from the existing mirror set up) not being able to import the pool, as compared to the import from a live FreeBSD USB stick that is done by ZFS belonging to that running kernel?
 
To me it doesn't seem likely (only a guess) but OP stated he used current pmbr and gptzfsboot to avoid this issue. Personally I would not initiate resilvering on Linux on a pool used only on FreeBSD, especially on boot pool.
From service delivery perspective what VladiBG said makes sense - recreate, restore and don't waste more time, possibly destroying data in the investigation process.

To satisfy my own curiosity I'd continue with the investigation to find out what's wrong.

Importing this pool (not ro) on FreeBSD 14.1 is a good start. I'd do these checks there:

a) after importing pool browse /boot, cat some ascii files to see all is ok
b) mount /dev to /tmp/altroot/dev, chroot to /tmp/altroot
c) reapply the bootcode (on a proper partitions) again
d) initiate zpool set bootfs=zroot/ROOT/default zroot

Test again.
 
Your zpool is newer version and can't be booted via the old gptzfsboot loader. The error message is clear enough that it has compression algorithm enabled or inherited on the pool which is not supported on the old zfs loader that you have. You can try to upgrade the boot loader from the newer version but then after it pass the boot to the zfs.ko it will fail again as your version of the zfs.ko is also old. So instead of trying to revert back the compression and upgrade the boot loader and the zfs.ko it's much better to make a backup of the current information and upgrade the entire system to 14.1.
 
I would not state that personally as a fact. While it may be the case seeing it's not able to parse the pool name and the jibberish around could very well be due to the overwritten MOS. And the unknown algo is just a side effect of the overwritten data.

As matter of fact I've taken the 14.1 and replaced gptzfsboot with one from 11.4
Code:
root@fbsd14zm:~ # zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:19 with 0 errors on Fri Jul 26 00:10:58 2024
config:

    NAME            STATE     READ WRITE CKSUM
    rpool           ONLINE       0     0     0
      mirror-0      ONLINE       0     0     0
        gpt/zdisk0  ONLINE       0     0     0
        gpt/zdisk2  ONLINE       0     0     0
      mirror-1      ONLINE       0     0     0
        gpt/zdisk1  ONLINE       0     0     0
        gpt/zdisk3  ONLINE       0     0     0

errors: No known data errors
root@fbsd14zm:~ #

root@fbsd14zm:~ # gpart show
=>      40  62914480  vtbd0  GPT  (30G)
        40      1024      1  freebsd-boot  (512K)
      1064   1048576      2  efi  (512M)
   1049640       984         - free -  (492K)
   1050624  61861888      3  freebsd-zfs  (29G)
  62912512      2008         - free -  (1.0M)
..
..

root@fbsd14zm:/usr/distfiles/11.0 # for i in `seq 0 3`; do gpart bootcode -b /usr/distfiles/11.0/boot/pmbr -p /usr/distfiles/11.0/boot/gptzfsboot -i 1 /dev/vtbd${i} ; done
partcode written to vtbd0p1
bootcode written to vtbd0
partcode written to vtbd1p1
bootcode written to vtbd1
partcode written to vtbd2p1
bootcode written to vtbd2
partcode written to vtbd3p1
bootcode written to vtbd3
root@fbsd14zm:/usr/distfiles/11.0 #
And was able to boot it just fine.
 
In my example it was 14.x pool with 11.x boot loader. OP had 13.x, rescue done in 11.x but with proper gptzfsboot according to his info.
Resilvering on Linux is fishy, could be a problem.

All I'm saying is that I don't personally believe that error message there is due to wrong bootloader, and I was able to boot new pool with old bootloader too.
To me it sounds like overwritten MOS; but then howcome he's able to import it and work with it in rescue. I'm thinking maybe bootfs does touch MOS in some way ; hence my suggestion to boot to 14.1 and redo that bootfs part.
 
As matter of fact I've taken the 14.1 and replaced gptzfsboot with one from 11.4
[...]
And was able to boot it just fine.
Your 14.1 set up shows an EFI partition; that's different from OP's set up.
The FreeBSD Boot Process shows for your 14.1 setup:
The Road to the FreeBSD loader(8)
[...]
Before we go into the details of each booting mode, first let’s list a simplified breakdown of all of them:
Code:
[...]
UEFI/GPT/MBR/UFS/ZFS (13.0 and later)
  +-> GPT/MBR from 'Boot Device' BIOS disk      | GPT/MBR
    +-> UEFI                                    | STAGE 0
      +-> loader.efi (/efi/FreeBSD/loader.efi)  | STAGE 1-3
        +-> kernel                              | KERNEL
          +-> init                              | INIT
I don't see any gptzfsboot involved there.

I tried to reinstall bootcode with

Code:
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada0
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada1
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada2
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada3
(pmbr and gptzfsboot from the zroot/ROOT/default to ensure they match the OS version since the rescue disk I'm suck with is FreeBSD 11.2 and production system is 13.2)
I'm not clear which bootcode & gptzfsboot from what FreeBSD version has been put onto ada0-ada3. If you execute gpart(8) of zroot/ROOT/default (=the mirror I presume) I'm really not sure that bootcode and gptzfsboot will be taken from the mirror as well.
 
Your 14.1 set up shows an EFI partition; that's different from OP's set up.
I installed FreeBSD manually, as I always do. I created both efi and bios boot partitions so I can boot the image in either configuration. As OP is legacy booting, I'm doing the same.

zstd compression is not available in 11.1
I used 11.4; I can try older versions too.

But my opinion didn't change.
 
(pmbr and gptzfsboot from the zroot/ROOT/default to ensure they match the OS version since the rescue disk I'm suck with is FreeBSD 11.2 and production system is 13.2)
I will try to make a test install of 13.2 with zstd under Legacy BIOS (freebsd-boot) and then replace gptzfsboot from 11.1 and will let you know the exact error msg that i get when try to boot from it.
 
Back
Top