ZFS System will not boot (ZFS errors)

I will try to make a test install of 13.2 with zstd under Legacy BIOS (freebsd-boot) and then replace gptzfsboot from 11.1 and will let you know the exact error msg that i get when try to boot it.
More tests the better.

I updated my 14.1 pool with 11.0 gptzfsboot (archive), it works just fine.

For me important messages are ones with the jibberish text and null MOS of pool.

Which would suggest something was overwritten in MOS rather that old bootloader is trying to walk structures it doesn't know about. I'm not ZFS dev, I haven't look at the MOS of bootfs and how it changed. Personally though I think what I stated above.
 
Erichans Not for the reason you stated, but you are correct. Meh! I used
Code:
qemu-system-x86_64 -M q35 -accel hvf -smp 4 -m 16384 \
-drive if=pflash,format=raw,read-only=on,file=QEMU_UEFI-x86_64.fd \
..
Without even thinking about it. And I had guts to say here I was legacy booting. Shame!

But again, partitions are there for this very reason - I can boot either way.
On 11.x boot I have this:
Screenshot 2024-07-26 at 12.44.14.png


Which OK, clear error and explanation is shown.

I had to rerun the tests:
a) overwrtite zfs partition with partition bootcode by specifying incorrect partition number and landing on zfs partition: still booted in legacy
b) 11.4 gptzfsboot on 14.1 pool:unable to boot, same error as above

Again though, gptzfsboot is clearly able to tell why it was not able to boot. OPs error message and gibrish around MOS would suggest he had overwritten it.
 
Here's the result:

Install 13.2-RELEASE with zstd compress enabled (default) on Legacy BIOS (freebsd-boot)
1721991612523.png


Boot from 11.2-RELEASE LiveCD and replace/ BREAK / the old boot loader gptzfsboot which doesn't know nothing about the new zfs features.
1721991755299.png


Try to boot again and you will receive this error:
1721991802840.png


Try to import zroot with the old zfs from 11.2:
1721991857105.png


Conclusion: You cant use the old gptzfsboot to boot the newer zfs. To repair it use the same or newer bootloader.
 
Part of the test I did remove one leg of the mirror in FreeBSD, booted to Ubuntu 23.10 and added the disk back to pool. I was still able to legacy boot FreeBSD. I don't necessarily consider this as deep test but rather a concept that it may work.

Quick look at zio_compress show that not many alogorithms exist; certainly not 67 that would come from any ZFS at this time, both current and previous ones.

I'm still convinced MOS was overwritten and that's why you see an error and not because gptzfsboot is older and not able to deal with this. But booting to 14.1 and reapplying bootcode was also a step I suggested above (along with setting bootfs property).

If data is of the essense restore it to a new VM to avoid dataloss.
 
The only access I have to the server is through the hosts IMPI in the form of JViewer. Trying to use the remote media in that does not work--when I try to load a CD or disk image in it, it gives the error "invalid header" (tried 4 images across 2 computers) so I have no way of booting off 14.1, all I have is the netboot recue image the hosting company provides.

I may be able to fetch the necessary 14.1 files from ftp and apply them with my 11.2 as long as backwards compatibility is there.

I was really hoping to not have to rebuild this because I've had so much problems with the hosting companies support I'm planning on just canceling the server anyway, but it's paid for another month so I was hoping I would be able to patch it together and make use of it for that time. The server seems to be rebooting on it's own occasionally so I may just need to abandon that plan due to possible additional hardware problems. Still would be good to find an answer here for the next person searching for this problem.

For a 'zicher' check you could have checked /boot/lua/loader.lua, it's an ascii file.
1722013646543.png


If you upgrade zpool created from 11.1 via 14.1 you will be unable to boot. You will end up with error 45
1722013764641.png


I'm not clear which bootcode & gptzfsboot from what FreeBSD version has been put onto ada0-ada3. If you execute gpart(8) of zroot/ROOT/default (=the mirror I presume) I'm really not sure that bootcode and gptzfsboot will be taken from the mirror as well.
The actual code I ran included full paths to files, I simplified for readability because I had to copy files from pools /boot to tmpfs /tmp then export the pool before I could execute the gpart commands (because pool is readonly)
 
I downloaded 14.1 base.txz and extracted pmbr and gptzfsboot from it and installed them. It did not help.

1722015119921.png
 
Trying to use the remote media in that does not work--when I try to load a CD or disk image in it, it gives the error "invalid header" (tried 4 images across 2 computers) so I have no way of booting off 14.1, all I have is the netboot recue image the hosting company provides.

This may sound crazy but
If you have empty disk in this server you can install the 11.2 via netboot on it using UFS not with zfs so it doesn't mess with zroot pool that you already have on other disks. Then you can upgrade it to 14.1 and see if you can import the pool there.

Just be careful on which disk you are installing 11.2 if you select the wrong one then your data will be lost as the zfs is already in degraded state

Edit:
Or you can manually create partitions then fetch and extract base.txz and kernel.txz
 
I downloaded 14.1 base.txz and extracted pmbr and gptzfsboot from it and installed them. It did not help.
Did you install it on all disks in the pool? Did you reapply bootfs ?

The BTX error you shown it hard to debug; it's probably GPF during mov eax, [eax] with wild pointer in eax. Not that it helps much here.

It's fascinating that you can work with pool just fine. Your compression algo id is 15 (lz4), so not 65.
We could doublecheck your work if you shared which version of gptzfsboot you applied (14.1 release?) and share the output of dd if=/dev/ada2p1 of=ada2p1.blob. Preferably for all disks in pool.

If you can UEFI boot that box (it's either a physical box where you can uefi boot or VM where you can change to uefi boot) you could use that one disk that it's outside of pool to create EFI partition, copy loader to it and boot from that. It may be worth as plan B.
 
Did you install it on all disks in the pool? Did you reapply bootfs ?
All disks yes.

boofs...not sure. I did these commands with files from 14.1

gpart bootcode -b pmbr -p gptzfsboot -i 1 ada0
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada1
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada2
gpart bootcode -b pmbr -p gptzfsboot -i 1 ada3

don't have time to check dd command until later.
 
Careful with that -i 1. Last time your ada1 didn't have proper partition layout and partition 1 was in the pool. We assume you removed it already; then it doesnt' make sense to touch it at all.
 
It's fascinating that you can work with pool just fine.
That had me puzzled.

I may be able to fetch the necessary 14.1 files from ftp and apply them with my 11.2 as long as backwards compatibility is there.
I'd be hesitant if that would lead to success; more likely more problems or at least unexpected results.

Based on the man pages and FreeBSD 11.3-RELEASE Release Notes - Boot Loader Boot Changes:
  1. 11.2-RELEASE - the last part of the boot chain before the kernel is loaded: gptzfsboot -> zfsloader
    zfsloader is part of the root dataset of the root pool.
  2. 11.3-RELEASE - the last part of the boot chain before the kernel is loaded: gptzfsboot -> loader
    loader is part of the root dataset of the root pool.
The functionality of zfsloader has been integrated in the next iteration of loader; before the integration you had both, for example: zfsloader(8)-11.2 and loader(8)-11.2.

This may well be the reason that the pool can be imported (albeit read-only) from a rescue 11.2-R dics/image because with the import you're past any boot process and then a ZFS (kernel) module of a fully operational OS imports the pool.

____
* the earlier referenced article The FreeBSD Boot Process is based on situation #1 or earlier:
BIOS/Legacy using GPT and ZFS
[...]
The pmbr reads from this boot partition and executes it. For the ZFS filesystem, the gptzfsboot file contains enough knowledge of ZFS and optionally GELI encryption to decrypt, load, and launch the zfsloader from the default ZFS dataset.

When zfsloader is loaded and running, we have reached Stage 3 and can select different ZFS Boot Environment and kernels to boot from.

P.S. as a noteworthy aside with respect to 11.2-RELEASE Notes - Boot Loader Changes:
The boot code and loader(8) have been updated to check for unsupported ZFS feature flags. If unsupported features are active, the pool is not considered as a bootable pool, and a diagnostic message is printed to the console. (r321519)
So, importing the pool in 11.0-R or 11.1-R would perhaps succeed (feature flags not fully vetted), but may lead to problems later.
 
zstd compression is not available in 11.1


View attachment 19690
There is no compression algorithm 67 at all.
So, this is just some bad / corrupted / misinterpreted on-disk data.
Which may not even come from the real zroot pool, but from some "phantom" pools that the boot code thinks it sees.
One reported pool name is obvious garbage, the other is empty (also garbage).
There are no complaints about 'zroot'.
 
I ran the dd commands (mostly for others searching in the future).

I've attached the 14.1 one that caused the panic as panic.blob and the code written to disk from zroot/BOOT/default as shown in previous posts as ada2p1.blob

I ran dd command on all disks and diffed the output. All are identical.

Also attached are the gptzfsboot and pmbr from zroot/BOOT/default (which should be 13.2)
 

Attachments

  • files.zip
    321.5 KB · Views: 18
Can you please share your pool information again ? gpart show for each device and zpool status so we can verify how your pool looks now.

The information you stated is correct.
pmbr and gptzfsboot seem to be from 13.2. The ada2p1 has 14.1 gptzfsboot in.
ada2p1 and panic.blob are the same.

Recently pmbr got small change that display error on how much bytes it loaded in. There were some issues in the past were freebsd-boot had to be enlarged (before 128k was enough, then it had to be larger). In an off chance that your pmbr is old on that disk (zipped pmbr is ok from 13.2) it could actually load not enough to cause that GPF you shared before.

I'd expect that once you removed the badly partitioned disk, installed proper pmbr and gptzfsboot on the rest of the disks to the proper partition location you'd be able to boot.
 
Back
Top