Can't boot from ZFS after upgrade to FreeBSD 14

I made a mistake that I didn't read the release note before I upgrade to FreeBSD.

I paste here to remind other people don't make the same mistake.

Upgrading from Previous Releases of FreeBSD​


Binary upgrades between RELEASE versions (and snapshots of the various security branches) are supported using the freebsd-update(8) utility. The binary upgrade procedure will update unmodified userland utilities, as well as unmodified GENERIC kernels distributed as a part of an official FreeBSD release. The freebsd-update(8) utility requires that the host being upgraded have Internet connectivity. Note that freebsd-update cannot be used to roll back to the previous release after updating to a new major version.
Source-based upgrades (those based on recompiling the FreeBSD base system from source code) from previous versions are supported, according to the instructions in /usr/src/UPDATING.
There have been a number of improvements in the boot loaders, and upgrading the boot loader on the boot partition is recommended in most cases, in particular if the system boots via EFI. If the root is on a ZFS file system, updating the boot loader is mandatory if the pool is to be upgraded, and the boot loader update must be done first. Note that ZFS pool upgrades are not recommended for root file systems in most cases, but updating the boot loader can avoid making the system unbootable if the pool is upgraded in the future. The bootstrap update procedure depends on the boot method (EFI or BIOS), and also on the disk partitioning scheme. The next several sections address each in turn.
Notes for systems that boot via EFI, using either binary or source upgrades: There are one or more copies of the boot loader on the MS-DOS EFI System Partition (ESP), used by the firmware to boot the kernel. The location of the boot loader in use can be determined using the command efibootmgr -v. The value displayed for BootCurrent should be the number of the current boot configuration used to boot the system. The corresponding entry of the output should begin with a + sign, such as
+Boot0000* FreeBSD HD(1,GPT,f859c46d-19ee-4e40-8975-3ad1ab00ac09,0x800,0x82000)/File(\EFI\freebsd\loader.efi)
nda0p1:/EFI/freebsd/loader.efi (null)
The ESP may already be mounted on /boot/efi. Otherwise, the partition may be mounted manually, using the partition listed in the efibootmgr output (nda0p1 in this case): mount_msdosfs /dev/nda0p1 /boot/efi. See loader.efi(8) for another example.
The value in the File field in the efibootmgr -v output, \EFI\freebsd\loader.efi in this case, is the MS-DOS name for the boot loader in use on the ESP. If the mount point is /boot/efi, this file will translate to /boot/efi/efi/freebsd/loader.efi. (Case does not matter on MS-DOSFS file sytems; FreeBSD uses lower case.) Another common value for File would be \EFI\boot\bootXXX.efi, where XXX is x64 for amd64, aa64 for aarch64, or riscv64 for riscv64; this is the default bootstrap if none is configured. Both the configured and default boot loaders should be updated by copying from /boot/loader.efi to the correct path in /boot/efi.

So I updated zpool as usual, and also updated bootcode, and then reboot.
Now I'm not able to find the bootable partition.

1.jpg



Disk info are as following

2.jpg


As the server has limited bandwidth, it's hard to mount an ISO and boot from ISO.
Is there any way to update ` /efi/freebsd/loader.efi` and boot the system?

Thanks
 
and also updated bootcode
I suspect you only updated the boot code in the freebsd-boot partition ( gpart bootcode ....). But that's for a CSM booting system. UEFI works differently.

Is there any way to update ` /efi/freebsd/loader.efi` and boot the system?
If you can boot the install media, yes. Something else (Linux?) might work too, the efi partition is a FAT32 formatted filesystem, so should be readable/writable with just about anything. All you have to do is copy /boot/loader.efi to that /efi/freebsd/loader.efi on the efi partition.

Make sure you do all your disks.
 
Oh, can you change the boot method? If you can, temporarily, enable CSM (aka BIOS boot) the system should boot normally (you probably only updated that bootcode). From there you can easily update the UEFI boot code. Reboot and switch back to UEFI.
 
Oh, can you change the boot method? If you can, temporarily, enable CSM (aka BIOS boot) the system should boot normally (you probably only updated that bootcode). From there you can easily update the UEFI boot code. Reboot and switch back to UEFI.
Thanks for your reply.

But unfortunately, change from EUFI to BIOS didn't work.

QQ截图20231229093728.jpg
 
Alright. Then I think you probably used the example from the man page ( gpart bootcode -p /boot/gptzfsboot -i 1 ada0. That installs the bootcode to index 1. On your system index 1 contains the efi partition. I was hoping you had noticed and used index 2 for writing the bootcode. But alas. The only way now to fix the boot is by booting some sort of install media or rescue system.
 
I suspect you only updated the boot code in the freebsd-boot partition ( gpart bootcode ....). But that's for a CSM booting system. UEFI works differently.


If you can boot the install media, yes. Something else (Linux?) might work too, the efi partition is a FAT32 formatted filesystem, so should be readable/writable with just about anything. All you have to do is copy /boot/loader.efi to that /efi/freebsd/loader.efi on the efi partition.

Make sure you do all your disks.
I used the livecd to boot the system, and then copied loader.efi to two locations via below command

efibootmgr -v --> to identify I need to update mfisyspd0p1

mount_msdosfs /dev/mfisyspd0p1/boot/efi
cp /boot/loader.efi /boot/efi/efi/freebsd/loader.efi
cp /boot/loader.efi /boot/efi/efi/boot/bootx64.efi

I also checked the sha256 of the copied efi files to make sure that they are good.

Then I reboot the system.
And then system just complaint Boot Failed: FreeBSD
QQ截图20231229124943.jpg


Looks like system could not boot from the efi file.
Is there any other file I forgot to update?
 
Update:

I found out why I'm not able to boot.

Looks like due to I'm using a HBA card (Dell H330), the default mfi driver in the livecd is not stable enough.
When I tried to copy the efi file, even though there is no error after copy, and no error when I run sha256, but if I run `sync`, then system will throw lots of errors. It seems that sometimes the file write to the disk is broken. ( Tried to load the right driver when boot via `set hw.mfi.mrsas_enable="1"`, no idea it's not working.

So I tried multiple times to make sure the efi file can be written to disk with the right checksum. Now I'm able to boot.

And then running into another issue that every time system is boot into single mode due to file mismatch.

14.jpg


And then I run fsck and reboot, all good now.

Server can boot into normal mode successfully
 
Should it be checking the msdosfs boot partition? Any way you can disable that to see if it gets any further?

Or maybe the msdosfs EFI hasn’t been written properly (as you say the H330 is better with the mrsas driver).

Like SirDice suggested you could try booting a Linux image and see if you can get a better result copying the EFI loader using Linux.

Hard work doing this remotely - good luck.
 
Assuming a typical installation, you should do the following:
  1. download minimal image (or boot the iso / img you downloaded)
  2. Go to to the emergency shell (or just boot single user).
  3. mount -t msdos /dev/da0pX /mnt pX is the ESP (where da0 is your root disk)
  4. cp /boot/loader.efi /mnt/efi/boot/bootx86.efi
  5. cp /boot/loader.efi /mnt/efi/freebsd/loader.efi
  6. reboot
So you basically need to copy the right loader.efi to your ESP (ignore the gptboot stuff, it's wrong given the debug output you shared). I'm not sure which setup you have, so I suggested you copy both of the typical location.

I'm not sure you could boot with CSM. That boot loader, if you have it, is AFU too since you upgraded your zpool.

But having typed all that, I see that you are having trouble syncing the blocks to the ESP....

You can try the other suggestions for writing it with linux, etc... But if you can't write to the ESP and have it work, then you are likely in for much bigger problems. Sure, you can disable the fsck in /etc/fstab by changing last number in the ESP line to a 0, which wouldn't detect the corruption, but where in the rest of the boot would fail?

So, if all is well after you fsck the disk, then unmount the esp, set the fstab entry to noauto, change the fsck pass to 0 and you'll be fine... Except I'd be surprised if this was the last of the problems you see...
 
If I recall correctly, some recent motherboards has UEFI firmware which drop support for CSM, unfortunately. In these cases, UEFI boot is mandatory.
Hope that secure-boot never mandated forever...
 
Assuming a typical installation, you should do the following:
  1. download minimal image (or boot the iso / img you downloaded)
  2. Go to to the emergency shell (or just boot single user).
  3. mount -t msdos /dev/da0pX /mnt pX is the ESP (where da0 is your root disk)
  4. cp /boot/loader.efi /mnt/efi/boot/bootx86.efi
  5. cp /boot/loader.efi /mnt/efi/freebsd/loader.efi
  6. reboot
So you basically need to copy the right loader.efi to your ESP (ignore the gptboot stuff, it's wrong given the debug output you shared). I'm not sure which setup you have, so I suggested you copy both of the typical location.

I'm not sure you could boot with CSM. That boot loader, if you have it, is AFU too since you upgraded your zpool.

But having typed all that, I see that you are having trouble syncing the blocks to the ESP....

You can try the other suggestions for writing it with linux, etc... But if you can't write to the ESP and have it work, then you are likely in for much bigger problems. Sure, you can disable the fsck in /etc/fstab by changing last number in the ESP line to a 0, which wouldn't detect the corruption, but where in the rest of the boot would fail?

So, if all is well after you fsck the disk, then unmount the esp, set the fstab entry to noauto, change the fsck pass to 0 and you'll be fine... Except I'd be surprised if this was the last of the problems you see...

I have had two out of three upgrades from FreeBSD 13.2 to 14.0 fail.
All three machines boot from EFI into the operating system, but the two failed machines cannot mount the root zfs filesystem for some reason. If I revert to a boot environment to boot 13.2, no problem, but the 14.0 boot will fail.
I have even tried updating the efi boot file to the one on the 14.0 install media, but the 14.0 boot process still fails to mount the root zfs filesystem (see below):
Screenshot_20231231_192738.png

On my desktop machine, it seemed the only way to get to boot FreeBSD 14.0, was to reinstall from scratch. Though the upgrade from 13.2 to 14.0 on my laptop worked, but the same upgrade on my NAS failed.
I am trying to avoid a complete reinstall on my NAS, but I am lost on where to go from here.
 
First of all, DO NOT ATTEMPT TO UPGRADE ZFS POOL BEFORE YOU CONFIRM EVERYTHING ELSE IS PERFECTLY OK. Then update boot codes and restart to confirm that the boot code is not broken (properly copied from healthy one), read carefully freebsd-current ML, freebsd-stable ML and freebsd-hackers ML (reading archives are OK) for latest several months to confirm no complaints is seen, if all are OK, upgrade pool if you REALLY want.

Is the photo taken while attempting to boot 14.0?
if so, it seems that it attemps to boot from 13.2, as your zfs.root.mountfrom points to zfs:zroot/ROOT/13.2-RELEASE-p9_2023-12-29_122203.
If it's hard-coded in /boot/loader.conf, change it to what BE you want to boot from, otherwise choose proper BE by activating it.

And are you sure you updated loader.efi (as is and/or as bootx64.efi) for all ESP (EFI type partitions) on all your disks? Maybe disk[0-7]-p1 as seen in your top post.
 
First of all, DO NOT ATTEMPT TO UPGRADE ZFS POOL BEFORE YOU CONFIRM EVERYTHING ELSE IS PERFECTLY OK. Then update boot codes and restart to confirm that the boot code is not broken (properly copied from healthy one), read carefully freebsd-current ML, freebsd-stable ML and freebsd-hackers ML (reading archives are OK) for latest several months to confirm no complaints is seen, if all are OK, upgrade pool if you REALLY want.

Is the photo taken while attempting to boot 14.0?
if so, it seems that it attemps to boot from 13.2, as your zfs.root.mountfrom points to zfs:zroot/ROOT/13.2-RELEASE-p9_2023-12-29_122203.
If it's hard-coded in /boot/loader.conf, change it to what BE you want to boot from, otherwise choose proper BE by activating it.

And are you sure you updated loader.efi (as is and/or as bootx64.efi) for all ESP (EFI type partitions) on all your disks? Maybe disk[0-7]-p1 as seen in your top post.
The ZFS pool has not been updated yet. I cannot even get 14.0 to boot successfully since the upgrade, so a ZFS upgrade cannot be performed. But it does boot from the EFI partition successfully.
The screenshot is FreeBSD 14.0 trying to boot, 13.2-RELEASE-p9_2023-12-29_122203 is the name of the the current boot environment that is active. There is two, maybe even three boot environment that have failed FreeBSD 14.0 upgrades.
I am very sure the updated loader has been copied to the correct EFI boot partition, as it resides on /dev/ada0p1. Both the /efi/boot/bootx64.efi and /efi/freebsd/loader.efi on the EFI partition have been updated to one from the 14.0 install media. The other disk you see from the screenshot is one of the six drives that are in a RaidZ2 pool (zdata), which is a separate pool from zroot and different drives from the efi boot partition.
FreeBSD 14.0 does not recognise the zroot partition as having a valid file format, yet 13.2 has no problems. I do not believe the EFI boot files are the problem, as FreeBSD 13.2 can mount the zroot partition successfully.
 
So the next question.
Is there any kernel modules (*.ko, except kernel) which are loaded BEFORE zfs.ko and large?
Possibly, kernel and some modules became larger and there are not enough room for zfs.ko (and maybe opensolaris.ko as its prerequisite) in the staging area which loader allocates with fixed (defined build time) size.
In this case, you should remove modules not essentially necessary to boot kernel from /boot/loader.conf and add them in kldlist variable of /etc/rc.conf[.local]. Someone else was bitten by it.
 
I have worked out what the issue was with the NAS.
The issue with the NAS, was that OpenZFS kernel module was installed and in use. Upon reboot, the OpenZFS kernel module was for FreeBSD 13, therefore the kernel would not load it, then could not load ZFS pools.
To fix the issue, using a known good boot environment that was still FreeBSD 13.2, boot into system, using bectl mount the broken FreeBSD 14.0 boot environment and edit the loader.conf like so:
security.bsd.allow_destructive_dtrace=0
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES" <= Re-enable the builtin ZFS kernel module
#openzfs_load="YES" <= This is the line causing the issue, now commented out
ipmi_load="YES"
amdtemp_load="YES"
kern.ipc.semmni=256
kern.ipc.semmns=512
kern.ipc.semmnu=256

kern.racct.enable=1
Reboot the system and I was able to boot into FreeBSD 14.0 successfully and update all my installed packages. I have not re-enabled the OpenZFS kernel yet, and I may not either.
Interesting enough, this was the issue with the NAS. But the desktop had the same issue but not it was not using the OpenZFS kernel module. The laptop also experienced a similar issue when upgrading 13.1 to 13.2. Weird and I never pursued the issue thoroughly with the Desktop or the Laptop.

Thanks for the help and suggestions!

Screenshot_20240101_111304.png
 
Kernel modules should be kept in sync with kernel. This means if you have modules from ports, not limited with openzfs.ko, you need to rebuild them.
I cannot recommend ports openzfs, as it was introuced for testing it on 12.x or older before base switches from legacy ZFS to OpenZFS on 13.0.
Now 13.0 and after has OpenZFS in-tree so there should be no need to use ports one, unless you are OpenZFS developer.

Note that if you really need to use sysutils/openzfs-kmod and sysutils/openzfs with Root-on-ZFS environment, you should add sysutils/openzfs-kmod to PORTS_MODULES variable in /etc/src.conf.
With usual ports building, already-installed world is used and cause openzfs.ko out of sync with kernel on first reboot after installkernel.
Using PORTS_MODULES method, listed kmods are built in conjunction with kernel, if I understand correctly.
Out-of-sync [open]zfs.ko should cause unbootable environment at least when related (used) KBI is changed.
For kmods which loaded on rc.conf is OK, I don't recommend them added to PORTS_MODULES variable. Sometimes adding them causes unintended kernel build failure. Rebuilding them once single user boot for installworld is confirmed successful is sufficient and safe.
 
Back
Top