No boot device after zfs-upgrade

jbo@

Developer
On a FreeBSD 12.2 machine, I noticed the following message(s) when running zpool status:
Code:
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
After reading the relevant documentation(s), I concluded that running these two commands will be the way to go (review and then apply the changes):
Code:
zfs upgrade -v
zfs upgrade- a
The upgrade appeared to work successfully.

The host has two zpools:
- zroot: 2x 256GB NVMe mirror
- storage: 4x 10TB raidz2

After upgrading, zfs upgrade -a informed me that I should update the boot code on all bootable devices. In my case this would be the two NVMe drives in the zroot pool (nvd0 and nvd1.
In an attempt to be helpful, zfs upgrade -a also mentioned an example of using gpart bootcode to upgrade in a scenario where GPT is used.

Unfortunately, I was foolish and forgot to check whether this particular host is actually using GPT and ran the command on both disks. However, this is an EFI system.
Now I am stuck with a machine that doesn't boot. After the BIOS I am being informed that I should select an appropriate boot device and try again.

Before I start further messing with the system, I would like to inquire some information of what exactly should happen to recover from this issue. I did some research and found a few forum topics but given the delicate nature of the operation I'd like to get some more input for my particular case.

Right now I booted FreeBSD in live mode from a USB drive on the affected host. This allows me to access the disks and make the necessary modifications.
Here's the output of gpart show of both disks in the zroot pool.

gpart.jpg

Where should I go from here? Is there a certain set of analytics I should run/perform before I start re-writing the EFI boot code?

As far as I understand, I would need to do this to restore the EFI boot code:
Code:
gpart bootcode -p /boot/boot1.efi -i 1 nvd0
gpart bootcode -p /boot/boot1.efi -i 1 nvd1
Is this correct? Is there something else I should watch out for?

Is there a chance that this is beyond repair?
I'd be thankful for any kind of input on this.
 
Unfortunately, I was foolish and forgot to check whether this particular host is actually using GPT and ran the command on both disks. However, this is an EFI system.
Now I am stuck with a machine that doesn't boot. After the BIOS I am being informed that I should select an appropriate boot device and try again.

Before I start further messing with the system, I would like to inquire some information of what exactly should happen to recover from this issue. I did some research and found a few forum topics but given the delicate nature of the operation I'd like to get some more input for my particular case.

Right now I booted FreeBSD in live mode from a USB drive on the affected host. This allows me to access the disks and make the necessary modifications.
Here's the output of gpart show of both disks in the zroot pool.

View attachment 9409

Where should I go from here? Is there a certain set of analytics I should run/perform before I start re-writing the EFI boot code?
Should not to be a big problem. Reinstall the EFI partition. EFI boot can even coexist with legacy boot in separate partitions, but if tour BIOS EFI works OK, there is no need for legacy boot. Your data is most probably safe. Read

This should do the work:
Code:
mount -t msdosfs /dev/nvd0p1 /mnt
cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
umount /mnt
 
On a FreeBSD 12.2 machine, I noticed the following message(s) when running zpool status:
Code:
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
.. but I have another question here - how did you get that message? The pool features come with kernel updates. You are using the base ZFS, right? Hope that you did not upgrade the pool using OpenZFS. If this is the case, reinstalling the boot code does not help. But even in this case, there is still hope...
 
.. but I have another question here - how did you get that message? The pool features come with kernel updates. You are using the base ZFS, right? Hope that you did not upgrade the pool using OpenZFS. If this is the case, reinstalling the boot code does not help. But even in this case, there is still hope...
This is a machine that was previously running FreeBSD 12.1 which was at one point upgraded to FreeBSD 12.2.
I am using the built-in ZFS - nothing special there.
Upgrading from 12.1 to 12.2 brought also a kernel updates therefore this should match expectations?

Happy to hear that there is still hope even if this would be a problem.



This should do the work:
Code:
mount -t msdosfs /dev/nvd0p1 /mnt
cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
umount /mnt

I am unable to mount the partition:
Code:
# mount -t msdosfs /dev/nvd0p1 /mnt
mount_msdosfs: /dev/nvdp1: No such file or directory
What's the problem here? Corrupt partition table? I assume then gpart show would not list them properly?
 
What's the problem here? Corrupt partition table?
No, you overwrote that partition with code from boot1.efi. So now it's not a FAT filesystem anymore.

This isn't the correct way (this efifat file is going to disappear in 13.0) but it'll do for now.

Code:
gpart bootcode -p /boot/boot1.efifat -i 1 nvd0
gpart bootcode -p /boot/boot1.efifat -i 1 nvd1
 
This isn't the correct way (this efifat file is going to disappear in 13.0) but it'll do for now.

Code:
gpart bootcode -p /boot/boot1.efifat -i 1 nvd0
gpart bootcode -p /boot/boot1.efifat -i 1 nvd1
Both commands executed successfully. However, I am still unable to boot: The system doesn't find a bootable device.

Unfortunately, this exceeds my current experiences with FreeBSD. Given the good documentation and stability I never ran into an issue like this so far :p

Where to go from here? How to figure out what's broken and eventually how to repair it?

Thank you for your help guys - It's greatly appreciated.
 
I am unable to mount the partition:
Then you should format it first and create FAT.
newfs_msdos -F 32 -c 1 /dev/da0p1

Then mount it and create EFI folder.

Code:
mount -t msdosfs /dev/da0p1 /mnt
mkdir -p /mnt/EFI/BOOT

After that copy the loader.

And please show us your /boot/loader.conf. You are able to import the 'zroot' to the rescue system (booted from stick)?
The whole story looks very much like you had OpenZFS and upgraded the pool under FreeBSD 12.2. If this is the case, you have at least 2 options. Everything can be repaired, but it is important to find the root of the problem.
 
Here are the steps I performed:
Code:
newfs_msdos -F 32 -c 1 /dev/nvd0p1
mount -t msdosfs /dev/nvd0p1
mkdir -p /mnt/EFI/BOOT
cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
umount /mnt
gpart bootcode -p /boot/boot1.efifat -i 1 nvd0
I repeated the same for the second drive in the pool: nvd1
Unfortunately, I am still unable to boot.

I booted back into the live system and imported the pool:
Code:
# zpool import -f -R /mnt zroot
# zfs mount zroot/ROOT/default
This allowed me to successfully import the pool and mount the filesystem.
I had to use -f as this pool was of course used on a different system prior. I hope that this will not lead to issues down the road when attempting to boot from it later?

Here's the contents of /boot/loader.conf as requested:
Code:
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
opensolaris_load="YES"
zfs_load="YES"
pf_load="YES"
kern.racct.enable=1

As far as my notes/documentation on this host go, there shouldn't be any reason to believe that non-stock ZFS was used. This host was installed using FreeBSD 11.2 and then gradually upgraded to 12.1 and 12.2.
However, I am not sure what opensolaris_load="YES" is doing in the boot load config file...

Where to go from here? :/
 
Redo what you have done, but without gpart bootcode -p /boot/boot1.efifat -i 1 nvd0 and gpart bootcode -p /boot/boot1.efifat -i 1 nvd1.

By the way, I didn't updated the content of the efi partition when I upgraded to 12.2-RELEASE and it worked (on a system which boots on efi).
 
Where to go from here? :/
You can try: zpool get bootfs after pool import.
What is the MB/BIOS model? Some brands have difficulty (non standard) with UEFI boot. You can install the legacy boot in this case.

You can also try to replace the /boot/loader.efi with the latest version from 13.0-RC2 loader.

I can see that your nvd is not huge. The next option is to transfer the whole pool to a new disk. For example take a cheap 1TB (or smaller) rotating drive, manually partition it and use fresh FreeBSD rescue system. See if it boots. When installing the rescue system, use another pool name instead of 'zroot'. Now you can create the third pool on that disk.
Import your original pool to that rescue system, create recursive snapshot and using zfs send, zfs receive transfer your old pool to the freshly created empty pool. Set it bootable with zpool set bootfs.

Code:
zfs snapshot -r source_pool@replica

zfs send -R source_pool@replica | zfs receive -F dest_pool

zpool set bootfs=dest_pool/ROOT/default dest_pool

This procedure works and if you suspect the pool version, you can create the dest_pool with lower version and transfer. I have even downgraded from OpenZFS upgraded (and not bootable) pool to the 12.2 base pool and made it bootable again.
 
Here are the steps I performed:
Code:
newfs_msdos -F 32 -c 1 /dev/nvd0p1
mount -t msdosfs /dev/nvd0p1
mkdir -p /mnt/EFI/BOOT
cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
umount /mnt
gpart bootcode -p /boot/boot1.efifat -i 1 nvd0
The last command here overwrites EFI partition. After you unmount, you are done. boot1.efifat is deprecated today. You should only copy the loader to that folder on efi partition. Hope the loader is good. You may try to take from another source. The whole efi partition is not tightly coupled to the system. Relativity easy and safe to handle and modify. The loader.efi you copy should be good. Maybe it is corrupted for some reason. Take it from another source. It implements a simple standalone ZFS functionality to access bootable files on ZFS pools. If this file is old, it may not recognize the new pool version. As a good practice, I am always compiling the loader from source. It can be even build independently of the system. As I wrote before, you may try even a new loader from 13.0.
 
By the way, I didn't updated the content of the efi partition when I upgraded to 12.2-RELEASE and it worked (on a system which boots on efi).
But nobody knows what version of /boot/loader.efi he has. The logic behind that upgrade message is clear - old loader may not be able to recognize the latest pool. That is exactly the case of 12.2 and OpenZFS. It can be used, but if you upgrade the pool, the loader does not recognize it any more.

I did some experiments, rebuilt the loader and inserted some debug printouts. This is all understandable - the loader has a simple, stand alone ZFS implementation in it and if the pool version is too much higher than the loader version, it simply fails.

The process is relatively simple and straightforward - the BIOS starts the loader and it scans for bootable storage pools and datasets. There may be only 2 possibilities - the loader does not start for some reason or it does not find the bootable dataset. As I wrote already, the loader can be rebuilt with debug printouts. In this case, at least, one can ensure that the loader in fact starts.
 
I also upgraded the pool. The upgrade message isn't clear and should be adapted to the current boot method, at least. Or better, the upgrade process could execute a script that does correctly the thing with the approval of the user.
 
I also upgraded the pool. The upgrade message isn't clear and should be adapted to the current boot method, at least. Or better, the upgrade process could execute a script that does correctly the thing with the approval of the user.
I think it comes from ZFS upstream. But agree, it can be better. The logic is still understandable - the loader should be able to handle the pool and it can not be upward compatible.
 
Thank you for the details and information provided.

I think that at this point it might indeed be easiest to just push the pool to a different host using zfs send, re-install the OS and pull back the relevant files.

If this is still helpful for you guys tho: I copied the loader from a FreeBSD 12.2 live instance.
 
Can you set your bios to EFI only, and not legacy/EFI. I'm wondering if it's getting tripped by the protective MBR into looking for a freebsd-boot partition.
 
If this is still helpful for you guys tho: I copied the loader from a FreeBSD 12.2 live instance.
If you have FreeBSD source, you can build a new loader with debug messages and try it.

To build the loader:

Code:
cd /usr/src/stand/
make install clean

Just tried on my desktop - it is not time consuming. Building the whole stand took 1.8 minutes.

The loader source is in /usr/src/stand/efi/
 
If this is still helpful for you guys tho: I copied the loader from a FreeBSD 12.2 live instance.
More good news for you - I did just an experiment on 12.2 ZFS machine, installing 13.0 loader on a spare drive where I created an EFI partition for that. As expected, it works!
The loader or EFI partition need not to be on the same drive where your ZFS pool is located. You can have multiple EFI partitions. The loader is backward compatible, so with 13.0 loader you can start the 12.2 system.
 
Can you set your bios to EFI only, and not legacy/EFI. I'm wondering if it's getting tripped by the protective MBR into looking for a freebsd-boot partition.

I still think this is still worth checking; assuming you (as you say, incorrectly for EFI) ran gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 nvd0 after the upgrade, you've still got the /boot/pmbr master boot record installed even after you restore (gpart bootcode -p /boot/boot1.efifat -i 1 nvd0) the EFI boot code partition.

When booting on a "legacy" (non-EFI) system, this boot record will advertise that it knows how to boot, but when it actually attempts to, it will look for a traditional freebsd-boot (see gpart(8) bootstrapping section) partition which, using EFI, you don't have. A BIOS that is in legacy/UEFI mode may very well attempt legacy boot first, think it has succeeded (with "I found a drive with a master boot record; my job here is done") and leave you in the unable-to-boot state.

You can either disable legacy boot (only EFI boot), or try wiping out the protective MBR: dd if=/dev/zero of=/dev/nvd0 bs=512 count=1 (and also for nvd1).

NB: Be very careful whenever using dd(1); make sure you understand what the command is doing before proceeding.
 
The boost mode was set to EFI only so far. This morning I did attempt to boot after changing it to EFI/Legacy and afterwards to just Legacy. Overall no difference: The system still doesn't boot.
I changed the setting back to EFI only boot as I don't intend to do anything else after fixing this :p

I'll attempt wiping the MBR later today - nothing I want to do while being in a rush.

More good news for you - I did just an experiment on 12.2 ZFS machine, installing 13.0 loader on a spare drive where I created an EFI partition for that. As expected, it works!
The loader or EFI partition need not to be on the same drive where your ZFS pool is located. You can have multiple EFI partitions. The loader is backward compatible, so with 13.0 loader you can start the 12.2 system.
Hmm... this is indeed interesting to know.
How would this be different (in essence) from what I've been doing so far? I take it that my system should be able to boot from the ZFS pool both on the 12.2 loader and the 13.0 one, right? Do you have any reason to believe that performing the previously done steps (copying the loader) from a FreeBSD 13.0 system is gonna result in anything other than what we have experienced when doing it with a 12.2 loader?

Is there anything special that I have to do/consider when trying to do this on a separate drive? Or would I just create the EFI partition, copy the loader and tell my system to boot from that drive instead? No magic involved?
Where/how would I tell the system to boot from the zroot pool located on the currently present drives when setting up an EFI partition on a different drive?

Thank you very much for your efforts!
 
Hmm... this is indeed interesting to know.
How would this be different (in essence) from what I've been doing so far? I take it that my system should be able to boot from the ZFS pool both on the 12.2 loader and the 13.0 one, right? Do you have any reason to believe that performing the previously done steps (copying the loader) from a FreeBSD 13.0 system is gonna result in anything other than what we have experienced when doing it with a 12.2 loader?
13.0 loader is a major upgrade. You can give it a try. My personal takeaway here is that when upgrading, it is a good idea to upgrade the loader first and the system after that. So, I am writing this message on a 12.2 machine, but booted up with new 13.0 loader this morning. When upgrading one day, I can be sure that the loader is already good.
 
You can either disable legacy boot (only EFI boot), or try wiping out the protective MBR: dd if=/dev/zero of=/dev/nvd0 bs=512 count=1 (and also for nvd1).

NB: Be very careful whenever using dd(1); make sure you understand what the command is doing before proceeding.

This would not only delete the MBR but also partition table, right?
Is that a non-issue here because the EFI partition would provide its own means to find/locate the partition(s) I want to boot from or am I missing something obvious here?
 
This would not only delete the MBR but also partition table, right?
Is that a non-issue here because the EFI partition would provide its own means to find/locate the partition(s) I want to boot from or am I missing something obvious here?
This seems not to destroy the partition table, just the MBR. But it is always a good idea to write down your partition tables with gpart backup nvd0. The output is just text. Write it down,. You can always restore the partition table if you know the block boundaries.
 
Alright, after backup up the partition table I nuked the MBR of both nvd0 and nvd1 but the system still refuses to boot.

Given that I am able to import the zroot pool on a live system on the same host I agree with you guys that this should be fixable and I'd certainly prefer fixing it over just re-installing and migrating.

I am wondering why the system would not boot after copying the boot loader from a running FreeBSD 12.2 live system. Just to be sure, the location of the loader is /EFI/BOOT/BOOTX64.efi - that is correct, right?
Is there anything else that would need to happen for the system to recognize this? The system itself is setup to boot only from EFI (and this worked prior to the zfs upgrade well for over a year on that exact system. Does the EFI partition need any form of special flags or other magic to be considered worth checking out by the system when booting?

Taking everything into account that you guys mentioned so far the following measure are available to continue from here:
- Compile the boot loader with debug output enabled and run that to potentially figure out where things go wrong.
- Boot a FreeBSD 13.0 live system instead and copy that boot loader
- Lastly, zfs send|recv the pool to a different device, re-install the host and migrate back

If it's of any help, here is the output of gpart backup prior to overwriting the MBR with zero:

WhatsApp Image 2021-03-20 at 14.00.18.jpeg
 
I just tried the FreeBSD 13.0 RC3 loader and the issue remains - still unable to boot from the pool.

Steps I performed:
1. Boot into FreeBSD 13.0 RC3 on the affected host
2. Mount the EFI partition of the first device in the zroot mirror pool to /mnt
3. cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
4. Repeat for the second device in the mirrored pool
5. Reboot and hope for the best

No tears were shed.
 
Back
Top