No boot device after zfs-upgrade

What is the exact message when the system doesn’t boot?
The message reads:
Code:
Reboot and Select proper Boot device
Please press the <DEL> key into BIOS setup menu.
or Press the <CTRL-ALT-DEL> key to reboot the system.


Is the zpool’s bootfs property set?
I booted into FreeBSD 13.0 RC3 on the affected host, imported the zroot pool and inspected the properties using zpool get. The bootfs is set to YES.
 
It needs to be set to the actual boot (root) filesystem:

$ zpool get bootfs newsys
NAME PROPERTY VALUE SOURCE
newsys bootfs newsys/ROOT/13.0-RC2.2 local

bootfs=(unset)|pool/dataset
Identifies the default bootable dataset for the root pool. This
property is expected to be set mainly by the installation and
upgrade programs. Not all Linux distribution boot processes use
the bootfs property.

 
Considering the information currently present on the situation at hand, where would you guys recommend me to go from here?
 
The message reads:
Code:
Reboot and Select proper Boot device
Please press the <DEL> key into BIOS setup menu.
or Press the <CTRL-ALT-DEL> key to reboot the system.
This does not look like loader message. This is more like BIOS message. That may mean that the loader has not been started.

In the BIOS setting you should see all the UEFI devices (depending on the BIOS of course). Try changing boot order in BIOS. Can you try booting on another motherboard?

In this desktop machine, I am writing this message, I have 4 drives and all have UEFI partitions (with 2 different loaders). When I go to BIOS, I can see all of them and select the boot device.

Also, when you have booted from another drive, you can try efibootmgr -v. See efibootmgr(8)
 
This does not look like loader message. This is more like BIOS message. That may mean that the loader has not been started.
Yes, that is correct. Hence the topic title :p

Interesting... I have no idea why I didn't check this earlier but the BIOS doesn't list any bootable devices (other than the USB Stick if I have it inserted to boot to FreeBSD live).

Can you try booting on another motherboard?
Technically I can - just gonna take some time for me to get there.
When pursuing this, do I need to hook up both drives to the "spare host" I will try to boot from or is just one drive enough?
I assume that I would want to have both disks there so in case the boot works the contents of the two disks remain the same - but is it technically possible to boot from a mirrored ZFS pool with just one device present without any user intervention?

Also, when you have booted from another drive, you can try efibootmgr -v.
As far as I can tell the boot drives (the two NVMe drives that make up the mirrored zroot zpool) do not show up there.
It lists:
- Built-in EFI Shell
- Network card
- Hard drive (which I belive are the 4x 10TB storage disks)
- The USB key containing the live system

What possible reasons lead to the system not detecting the NVMe drives? The hardware remained unchanged between the prior working state of the system and the current state after zfs upgrade.
 
I concur with Argentum — we’re not getting to the loader at all; I would go through your BIOS settings, or perhaps reset BIOS settings to default and see if that shakes things loose. How are the nvme devices attached? On the motherboard? PCIe card?

Is there a “Select Boot Device” or similar boot option you can activate (typically a function key) during the boot process? This will usually still list other devices even if they aren’t in the “active” boot priority list.

As a last resort, you could make a USB boot stick that has the EFI loader (in an efi boot partition, similar to what you have on your nvd devices) and nothing else; it (the loader) will scan other devices after it doesn’t find a bootable filesystem on the device.
 
Check your UEFI/BIOS and the docs of your MB. Some systems can not boot off a NVMe device. Then you can perform any vodoo, it just won't boot.
EDIT E.g. zeRusski ran into that issue. Search for his thread & a solution.
 
How are the nvme devices attached? On the motherboard? PCIe card?
Those are two m.2 form-factor NVMe drives attached to PCIe adapters. These are pure physical adapters and don't contain any sort of controller chips. Each of the two NVMe drives has its own PCIe adapter.

Is there a “Select Boot Device” or similar boot option you can activate (typically a function key) during the boot process? This will usually still list other devices even if they aren’t in the “active” boot priority list.
Yes there is. In this case it's F11. The boot list is empty unless the USB stick with the FreeBSD system is plugged in.

In the meantime I updated the BIOS of the mainboard. It's a SuperMicro X10SRW-F. Up until now it was running BIOS 3.0a. Now it is upgraded to 3.3.
However, the symptoms remain: The two NVMe drives do not show up as boot drives anymore.

Check your UEFI/BIOS and the docs of your MB. Some systems can not boot off a NVMe device. Then you can perform any vodoo, it just won't boot.
This system has been running for more than two years without any issues. It is perfectly capable of booting from NVMe drives. At least it was until I ran the zfs upgrade.

As a last resort, you could make a USB boot stick that has the EFI loader (in an efi boot partition, similar to what you have on your nvd devices) and nothing else; it (the loader) will scan other devices after it doesn’t find a bootable filesystem on the device.
I think that this is gonna be my next step.
 
As a last resort, you could make a USB boot stick that has the EFI loader (in an efi boot partition, similar to what you have on your nvd devices) and nothing else; it (the loader) will scan other devices after it doesn’t find a bootable filesystem on the device.
I've just done this. I am able to boot from the created USB drive. From there the system complains that it cannot find a bootable partition with: ERROR: cannot open /boot/lua/loader.lua: no such file or directory.:
foo (1).jpg
Sorry for the blury-ness.
Where would I go from here?
 
The error you're getting (failed to find bootable partition) still points to the NVMe devices being not visible during the boot process; the fact that they are accessible after a full live image boot suggests this is a boot-time initialization issue, rather than a hardware failure.

Those are two m.2 form-factor NVMe drives attached to PCIe adapters. These are pure physical adapters and don't contain any sort of controller chips. Each of the two NVMe drives has its own PCIe adapter.

Hrmm. Perhaps the PCI devices need option rom scan turned on?

Perhaps this: https://www.supermicro.com/support/faqs/faq.cfm?faq=25543 ?
 
Hrmm. Perhaps the PCI devices need option rom scan turned on?

Perhaps this: https://www.supermicro.com/support/faqs/faq.cfm?faq=25543 ?
This has already been set to EFI.
The machine was able to boot in this configuration for more than two years. Literally nothing changed in terms of hardware or BIOS configuration between the working system and the broken system after zfs upgrade.

I looked through the BIOS settings several times - I even performed the suggested default settings reset, tried booting, modifed settings one by one back to the original configuration - the NVMe drives are still not listed as boot devices.
But as you suggested this is unlikely to be a hardware failure because I can access the disks without any problems from the booted live environment.

In the meantime I've also booted into the EFI shell and it shows all devices: The USB key(s), the 4x 10TB drives and the two NVMe drives. So the system certainly recognizes them. It just doesn't consider booting from them!

I am open for more suggestions.
 
My next step will be moving the two NVMe drives to a different host and trying booting from there.

I am open for more suggestions.
If this is too risky, or too much trouble, you can clone the system using the method I have practiced many times.

If you have a spare SATA port (or even SATA drive with USB adapter will do), just take whatever cheap SSD or rotating drive and connect this to the computer. Partition it manually in the similar way your present drives are partitioned. The ZFS partition can be bigger, but not smaller. After that, install loader in the EFI partition and using zpool attach connect this new partition to the existing pool. Now you have a 3 way mirror. Let it resilver and and shut down after that. Then just physically remove the new drive and bring it to another computer. It should boot right away. This is how I have cloned my FreeBSD laptop.

Later just zpool detach the now non existing drives from mirror. If the new partition is bigger, you can automatically have extra space in pool. This is how I changed the drive in may laptop and also moved the clone of my laptop system to test system.

Be careful not to use zpool add instead of zpool attach or you are in trouble! Read the manual zpool(8).
 
In the BIOS, if you select “Add boot option” do the NVME drives show as an option to add?

From the EFI shell (where it does see the NVME devices) are you able to select them and boot?
 
In the meantime I removed the two NVMe drives and added them to a different host: Same symptoms: They show up as devices, I can boot a live OS and import the pool and browse files but they don't show up as boot devices. I have done the same on yet another system ending up with the same symptoms.
Both other systems I tried to boot from those two NVMe drives are similar in design: Supermicro mainboards, Intel CPU and both of them boot from NVMe drives themselves in regular operation (I just took them down for this test). The mainboards and CPU are slightly different between the systems but the overall architecture is the same and everything used to be able to boot from NVMe drives.
I am really not sure what's going on here. I have more than one hosts which uses this exact configuration: Two separate NVMe drives (Samsung 970 Pro) via a m.2 PCIe adapter in a ZFS mirror pool. And again: The system in question used to work flawlessly for over two years so... wtf?

@Argentum: Thank yo ufor outlining that procedure. I have plenty of spare drives of all varieties to give this a go. I'll just take a regular old 512GB SATA SSD (my NVMe drives are both 256GB devices).

Eric A. Borisch: I am currently trying to figure out how I can select a drive and boot from it in the EFI shell. The drives show up as blk1 and blk2. I have selected (?) one by typing blk1: in the shell. Then I tried to ls but the EFI shell reported: ls/dir: Cannot open current directory - Not found.
At the moment I am unsure whether this is simply because it's a device from a ZFS pool and the EFI shell doesn't know how to handle that.
 
I have followed the procedure presented by Argentum (Adding a third (SATA SSD) device to the mirror, resilver the pool, shutdown, attach the SSD to a different machine) and as expected: The system boots just fine.

So... lets summarize:
- We have a host that was happy to boot from an NVMe ZFS pool for over two years
- The system received a zfs upgrade
- The system can no longer find the boot device(s).
- Booting a live system on the affected host allows to import the zroot pool, mounting the filesystem(s) and browsing them.
- Adding the NVMe devices to other hosts (which are also all booting from 2x NVMe ZFS mirror pools) does not make them show up in the list of bootabel devices
- Adding a SATA SSD to the pool that was imported while running a live system on the original host, resilvering the pool, removing the SATA SSD and adding it to a random old desktop computer allows booting the system as if nothing ever happened to it.

What do you guys make out of this?
 
In the meantime I removed the two NVMe drives and added them to a different host: Same symptoms: They show up as devices, I can boot a live OS and import the pool and browse files but they don't show up as boot devices. I have done the same on yet another system ending up with the same symptoms.
Both other systems I tried to boot from those two NVMe drives are similar in design: Supermicro mainboards, Intel CPU and both of them boot from NVMe drives themselves in regular operation (I just took them down for this test). The mainboards and CPU are slightly different between the systems but the overall architecture is the same and everything used to be able to boot from NVMe drives.
I am really not sure what's going on here. I have more than one hosts which uses this exact configuration: Two separate NVMe drives (Samsung 970 Pro) via a m.2 PCIe adapter in a ZFS mirror pool. And again: The system in question used to work flawlessly for over two years so... wtf?

@Argentum: Thank yo ufor outlining that procedure. I have plenty of spare drives of all varieties to give this a go. I'll just take a regular old 512GB SATA SSD (my NVMe drives are both 256GB devices).

Eric A. Borisch: I am currently trying to figure out how I can select a drive and boot from it in the EFI shell. The drives show up as blk1 and blk2. I have selected (?) one by typing blk1: in the shell. Then I tried to ls but the EFI shell reported: ls/dir: Cannot open current directory - Not found.
At the moment I am unsure whether this is simply because it's a device from a ZFS pool and the EFI shell doesn't know how to handle that.
So, just to recap;

nvme*p1 are msdosfs filesystems, with the file from your /boot/loader.efi (or from live USB, etc.) copied in to <fsroot>/EFI/BOOT/bootx64.efi, such that if nvmep1 is mounted at /boot/efi (not a necessity, but this is what the installer is going to start doing by default), you can see this (excepting ada vs. nvme), and your sizes are likely different from what I have (I'm running 13-RC3 built from source...)

$ mount | grep efi
/dev/ada0p1 on /boot/efi (msdosfs, local)
$ ls -l /boot/efi/EFI/BOOT/bootx64.efi /boot/loader.efi
-rwxr-xr-x 1 root wheel 896512 Mar 11 22:37 /boot/efi/EFI/BOOT/bootx64.efi
-r-xr-xr-x 2 root wheel 896512 Mar 10 08:53 /boot/loader.efi


And the partition type for nvmep1 is the efi boot type:
$ gpart show -rp | grep p1
40 4096 ada0p1 c12a7328-f81f-11d2-ba4b-00a0c93ec93b (2.0M)


I see your post about a new ssd you created and could get booting on another system; clearly the zpool has a usable system installation on it; but like I said earlier, on this system we don't even seem to be getting to the loader (and the fact your BIOS doesn't show the devices is certainly the issue to keep attacking, in my mind.) That's why double-checking the partition type, filesystem type, and bootx64.efi placement are what I'm asking here.
 
Is the first picture you shared really the state just after you did the zfs upgrades and nothing else ? Didn't you try to do something else before posting the image ?
Because in the layout you shared you're missing the freebsd-boot partition. EFI is just a fancy partition you can jump from to OS bootloader.
 
Is the first picture you shared really the state just after you did the zfs upgrades and nothing else ? Didn't you try to do something else before posting the image ?
Because in the layout you shared you're missing the freebsd-boot partition. EFI is just a fancy partition you can jump from to OS bootloader.
Yeah - I believe so, but can't promise for 100%.

Given that I am able to boot from a SATA SSD after performing the following steps:
1. gpart backup nvd0 | gpart restore -F ada0
2. Copying /boot/loader.efi from a FreeBSD 13.0 RC3 live system to the EFI partition
3. Adding the new SATA SSD to the mirrored zpool zroot and resilvering it
4. Removing the new SATA SSD and attach it to a random desktop machine
5. Boot happily

I'd say that things should be in order there.

I am currently performing a dd clone of the SATA SSD so I have a spare when things go wrong (I do have an off-site backup but re-installing and pulling the backups was the idea to avoid). After that I will further investigate as per Eric A. Borisch 's last post.
 
I always create the freebsd-boot partition, even on uefi only systems. I was doing this manually and it may not be needed actually. I spawned a VM with uefi only and did a generic bsdinstall with UEFI only setup. Indeed there's no freebsd-zfs partition.
I'm doing the upgrade in the VM myself now as I'm also curious to see why you hit the problem ( I have some ideas but I want to confirm it there ).
 
Enter in your bios, navigate to PCIe/PCI/PnP Configuration and enable Option ROM support and set it to EFI. Save and exit. Then enter again in the bios.
In your bios setup under boot verify if "boot mode select" is set to UEFI if it's not set it then save and exit, then enter again in the bios.
Under boot menu verify if you see the names of your nvme disks. If you don't see them as boot device you can try manual to create a new boot record or use efibootmgr. Also check the content of your startup.nsh in ESP it must point to bootx64.efi.
 
Enter in your bios, navigate to PCIe/PCI/PnP Configuration and enable Option ROM support and set it to EFI. Save and exit. Then enter again in the bios.
In your bios setup under boot verify if "boot mode select" is set to UEFI if it's not set it then save and exit, then enter again in the bios.
Under boot menu verify if you see the names of your nvme disks. If you don't see them as boot device you can try manual to create a new boot record or use efibootmgr. Also check the content of your startup.nsh in ESP it must point to bootx64.efi.
This; but it's still unclear why it would have been working and then stop working after a zpool upgrade. If the drives don't show up in the bios boot list, I'm not sure how anything else on the disk matters.

I've never used efibootmgr, but I see that the install scripts do; perhaps there is some magic that needs to be re-set.
 
Back
Top