ZFS gptzfsboot: failed to mount default pool zroot

I updated a server running 11.1 p9 to p10 today, rebooted and it won't come up again. It's a Hetzner dedicated server. Looking at it though their vkvm console, I got the following error:

Screen Shot 2018-06-17 at 13.13.30.png


It's a two disk mirrored pool with ZFS-on-root.

I have searched the forums and tried many of the suggested fixes, including

https://forums.freebsd.org/threads/...-unavailable-invalid-format.55227/post-342203
https://forums.freebsd.org/threads/...-unavailable-invalid-format.55227/post-342288
https://forums.freebsd.org/threads/...om-zroot-after-applying-p25.54422/post-308661
https://forums.freebsd.org/threads/where-is-all-about-boot.42980/post-345571

to no avail. The server will still not come up. It's a production machine and I'm all out of ideas so any suggestions are more than welcome.

Here's a Gist of gpart output in case that helps: https://gist.github.com/herrbischoff/1d42fdd484568f2b422b91aafa73e670
 
Sorry but I can't be bothered to go over all those individual threads. Also because several of them mention several solutions, leaving me to guess what you did or didn't try.

So, for starters: when you get into the boot menu, press escape and enter the lsdev command, does that actually detect your ZFS pool?

Second... this is a minor OS upgrade but have you ever performed a major upgrade (from 10 to 11 for example)? If so, did you also upgrade the ZFS pool (so: # zpool upgrade, see also zpool(8)) and also very important sometimes: did you install the new boot code? After a major OS upgrade it's usually a good idea to re-install the bootcode again (see gpart(8), the bootcode option) because several things can have changed, this often applies to ZFS.

Most important (so I assume) is the state of the server and the data. Have you tried a rescue CD yet to see if you can access the ZFS pool from there? (edit): This should also give you a good indication as to to current state of the ZFS pool.
 
Thanks for your reply. I have linked the individual posts so it wouldn't be too much to read. I figured that just linking the threads would make completely unclear about what I have done but the individual posts describe one approach at a time which I followed.

What I did miss to mention is that the pool mounts fine in a rescue system, it can be read and written to and no errors are present. Sorry for that, I'm quite under stress right now.

Regarding your questions: no, it has always been a 11 machine, no major upgrade. I've never had to upgrade the pool. From the rescue system I have already tried updating the bootcode but tsadly his did not change anything.
 
Regarding the boot menu: I cannot seem to get to the boot menu. This is indeed unusual. It may either have something to do with the Hetzner boot process for vkvm or some boot information is missing. Unfortunately I'm not too familiar with ZFS during the boot process.

It starts into a PXE boot system, waits a couple of seconds and then tries to boot the local disk (FreeBSD) and fails with the above message. I checked that there are no instructions preventing the display. I will try and add beastie_disable="YES", reboot and see what happens.
 
From the rescue system I have already tried updating the bootcode but tsadly his did not change anything.
What bootcode did you use? The one on the rescue system or the one on the server? If the answer is the first then try again and this time make sure to point to the boot code which is actually installed on the server itself. That's the only way to be 100% sure that you got the full matching versions.

As to the bootmenu not being there: that is most peculiar. You might want to check /boot/loader.conf to see if it contains something which blocks this and/or make double sure that you're using the right bootstrap method.

Lessee... based on the information you provided in the op...

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0. And you should do the same for ada1.

WARNING: I've been very careful to check the parameters but do not just blindly copy & paste without verifying yourself. Also: if you mounted your server OS under /mnt then you'd obviously use /mnt/boot/* instead.
 
No dice. Here's a small screen recording of the whole process:

Untitled.gif


There are no entries preventing the display of the boot menu. I triple-checked. For good measure I even added

Code:
boot_mute="NO"
autoboot_delay=10

at the very end of /boot/loader.conf (on the zroot pool of course).
 
The thought about the bootcode crossed my mind as well when I did it. So here's what I did step by step:

Code:
mkdir /tmp/mnt
zpool import -f -R /tmp/mnt zroot
gpart bootcode -b /tmp/mnt/boot/pmbr -p /tmp/mnt/boot/gptzfsboot -i 1 ada0
gpart bootcode -b /tmp/mnt/boot/pmbr -p /tmp/mnt/boot/gptzfsboot -i 1 ada1
zpool export zroot
reboot

That should have taken care of everything, correct? Unfortunately that did not work.
 
Watching that screen recording makes me believe that your problem isn't with FreeBSD at all but with the way that environment of yours is starting the system. I get the impression that it's trying to directly start / boot the freebsd-boot slice yet without any useful parameters which obviously leaves FreeBSD completely in the dark.

I mean; not even the kernel gets loaded (this is usually done by the bootloader, which gets started through the HD's bootsector).

I'm pretty convinced that you should re-check the boot process on that environment of yours. Verify what exactly it's trying to boot. I mean, the start itself looks highly suspicious to me. You start from a local disk and the first thing it mentions is "No more network devices"? What's up with that?
 
Thanks for your assistance and thoughts about this. I have now requested hardware KVM access to the server which should hopefully happen within an hour. The vKVM is apparently booting the system in a virtual machine which likely interferes with the FreeBSD boot process. Freaking Linux-centric server world...

I'll be updating this thread as I find out more.
 
Alright. KVM access did resolve the missing boot menu. The ZFS pool recognized via lsdev but now I'm stuck on the following beauty:

Screen Shot 2018-06-17 at 17.01.35.png


Any ideas?
 
Figured it out. Will update this post with more information once I've got the server completely up and running again.

To sum the whole ordeal up: it was indeed an issue with the Hetzner infrastructure. The vKVM rescue system they offer is virtual in the truest sense of the word: it boots the physical server in a virtual machine. I have never heard of such technology and was utterly confused once a technician explained it to me. While it appears clever on the surface, this deeply conflicts with FreeBSD's boot process, especially when using ZFS-on-root. So that's why the thing wouldn't boot while troubleshooting. After getting a real hardware KVM connected to the server, troubleshooting became workable. vKVM is evil.

I wasn't able to pin down the exact cause of why it wouldn't boot in the first place but my strong suspicion is on an external mountpoint that wouldn't come up and block the boot process. So, there was probably nothing wrong with FreeBSD or the upgrade, just some annoying little issue blowing everything out of the water.

The last bit was of my doing. As best as I can retrace my steps, I apparently changed the mountpoint for the zroot volume to be the same as zroot/ROOT/default. This obviously couldn't work. As soon as I unmounted and changed the mountpoint back to where it should have been in the first place and rebooted, everything fell into place and the machine came up again.

Thanks again, ShelLuser, for the assistance and thoughts about this issue. Sometimes you just need someone to bounce around some ideas. I've learned plenty today.
 
Back
Top