Solved broken ZFS root

Hi all,

Several days ago, I used freebsd-update for what I thought was a routine minor upgrade to my VPS running v13.0 to bring it to p1.

Something broke. The VPS would not boot:

ZFS: out of temporary buffer space

A quick search brought me to this page:

FreeBSD 13.0 upgrade zfs: out of temporary buffer space
It would appear that as part of the zpool upgrade that the boot code is not updated. There may have been a message about this but I didn't see it. I downloaded the FreeBSD 13-RELEASE mem stick image to a USB stick and booted that. You can repair the boot code with the following command

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i<gpart index of freebsd-boot> <block device>

I used the emergency boot facility at the hosting company, Transip, to boot and issue that gpart command. It seems to have worked, but when I reboot I end up at the FreeBSD boot prompt.

Someone on Reddit suggesting booting again from the emergency environment and issuing these commands:
  1. zpool import -N zroot
  2. zpool set bootfs=zroot/ROOT/13.0-RELEASE zroot
But this doesn't work because Transip's emergency FreeBSD boot is based on a very old version of FreeBSD, 10.3, and doesn't support all the features of the current version of ZFS. Screen cap:

Screenshot at 2021-05-30 13-55-53.png


The disk on my VPS is vtbd0.

The ZFS root partition is: zroot/ROOT/13.0-RELEASE

How do I get my VPS to boot again? As I said above, I can't get any further than the boot prompt right now.

Thank you
 
Right now I've only one single VM on 13 so my experience with 13 is limited. It seems your google search was good ; just out of curiosity I found this rant while I was looking for possible issues with the error message.
Judging from that rant it seems you found the root cause of your problem - not updated bootloader. Also from that rant it seems this is a culprit of a legacy boot (but I'd not put wager on it).

Note on FreeBSD 13 you have OpenZFS implementation of ZFS (as opposed to a one stemming from OpenSolaris). So you better use rescue system based on 13 and higher.
 
are you sure you installed the 13.x bootcode ?
if you copied the line as is you most likely installed the 10.x stuff
 
are you sure you installed the 13.x bootcode ?
if you copied the line as is you most likely installed the 10.x stuff
What I did was the following:

About a month ago I ordered this VPS and installed the latest version of FreeBSD that the hosting company had available: v12.2.

After installing that, I upgraded it to v13.0-RELEASE using freebsd-update. it seemed to be running fine.

A few days ago, I used freebsd-update again to install p1, which was just released.

Upon rebooting, I got the error message: ZFS: out of temporary buffer space, and the system would not boot.

if I understand you correctly, running gpart bootcode [...] from the emergency console would have installed the 10.x bootcode. Obviously that is not going to help.

Last night I mailed Transip support and pointed out that it would be really helpful if the emergency console ran a more recent version that more fully supported ZFS. They said they would get back to me.

If they make v12.2 or ideally v13 available, should running the gpart bootcode command from the emergency console then make my VPS bootable again?
 
Right now I've only one single VM on 13 so my experience with 13 is limited. It seems your google search was good ; just out of curiosity I found this rant while I was looking for possible issues with the error message.
Judging from that rant it seems you found the root cause of your problem - not updated bootloader. Also from that rant it seems this is a culprit of a legacy boot (but I'd not put wager on it).

Thanks for sharing that link. This:
So this sounds like a missing step in the automated upgrade flow. Normally, using new features in zpool is deferred until you choose to upgrade the pool after the reboot, so you get to see the warnings. At a guess (because I'm on FreeBSD 11.4 still), the OpenZFS migration forces the issue to do the zpool upgrade early and they missed the gpart requirement.

This seems to have what screwed me. At some point in the v12.2 --> v13.0 update path, the root zpool must have gotten silently upgraded.

My previous experience has been that you have to update ZFS pools manually. As you probably know, if you run zpool status, it tells you if a pool can be upgraded, and if you do so it advises you to run the gpart command aftewards.

Note on FreeBSD 13 you have OpenZFS implementation of ZFS (as opposed to a one stemming from OpenSolaris). So you better use rescue system based on 13 and higher.

Thanks for the heads-up. I will mention this to the hosting support.
 
A support person from Transip replies:
Onze Rescue image updaten lijkt me inderdaad een goed plan als de verschillende versies zo incompatibel zijn. Jammer genoeg gaat dat op korte termijn lastig worden.
auto-trans:
Updating our Rescue image seems like a good plan indeed when the different versions are so incompatible. Unfortunately, that's going to be difficult in the short term.
Unless someone here has any brilliant ideas, I guess I'm screwed :( :( :(
 
A support person from Transip replies:

auto-trans:

I guess I'm screwed :( :( :(
boot from the current rescue disk
bring ifaces up
scp a current/13 zfsbootcode file
install that
if you cant bring interfaces up for whatever reason but still have a full live fs just cat a uunecoded file into the console and then uudecode it
 
One of the things I find so bizarre about this situation is that I have a second VPS at a different company that is also running FreeBSD. It's a small VPS that I keep mainly as a backup mailserver. When I see that an OS update is available, I run freebsd-update on it first, as a safety precaution.

I did this time as well. I updated it to v13 about a month ago. A few days ago I updated it to p1. No problems. So I then went to upgrade my main server, which failed. What was the difference? I have no idea.
 
i dont know if freebsd-update tries to update the bootcode. clearly there are cases when it cant be done (like if you are pxebooting) booting from another disk that your root is mounted from,etc,etc
your other vps may have a 12.2 timeframe bootcode which may still work and 12.1 might not
 
boot from the current rescue disk
bring ifaces up
scp a current/13 zfsbootcode file
install that
if you cant bring interfaces up for whatever reason but still have a full live fs just cat a uunecoded file into the console and then uudecode it
Yes! OK, I just copied the gptzfsboot file from my backup VPS on a different host, which is also running v13-RELEASE-p1. I now see this file in the home directory of the rescue console. Is this the right file? I don't see a file called zfsbootcode in /boot on my other VPS

So how exactly do I copy it to the the /boot dir of my VPS?

gpart list shows the disk, vtbd0

vtbd0p1 is labelled gptboot0

so presumably /boot is there. But this device is not mounted, right?
 
gpart bootcode -p /root/Downloads/gptzfsboot -i<gpart index of freebsd-boot> <block device>
with that just use the correct path from gptzfsboot
or just dd if=/root/Downloads/gptzfsboot of=/dev/vtbd0p1 if you are brave :)
 
May I ask for a little more explanation please?
I upgraded a Freebsd 12 to 13.0-REALEASE-p4 - everything fine.
Then I moved to openzfs using this guide: https://openzfs.github.io/openzfs-docs/Getting Started/FreeBSD.html
which amongst other things tells you to:
For example, make changes to ~/.profile ~/.bashrc ~/.cshrc from this:
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:~/bin
To this:
PATH=/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin:~/bin


Everything seems to be fine. However I now understand that before I upgrade any pools I should re-install the bootloader.
Is that correct, and how should I do that with my single + backup disk system? (I'm not sure I fully understand covacts advice above.)
And lastly, if anyone has the time, why is this necessary?
Many thanks.
 
13.x is already using openzfs in base (probably an earlier version then the port though)
if the port supplies bootcode install that or if not
just install the /boot/gptzfsboot to your freebsd-boot partition with gpart
if you use EFI boot the above does not apply
 
Many thanks to covacat for the fastest reply I've ever seen.
I researched for further understanding, but all the documentation and guides assume a start from scratch install, so please accept my apologies for my lack of knowledge. But as people at my level know, boot issues can cause serious grief.
gpart shows the drive as:
=> 40 468862055 ada1 GPT (224G)
40 1024 1 freebsd-boot (512K)
1064 984 - free - (492K)
2048 4194304 2 freebsd-swap (2.0G)
4196352 464664576 3 freebsd-zfs (222G)
468860928 1167 - free - (584K)

From your advice and https://www.unix.com/man-page/freebsd/8/gptzfsboot/
Will the following command do the trick?
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

Many thanks again for your time.
LATER EDIT
Found this : https://forums.freebsd.org/threads/update-of-the-bootcodes-for-a-gpt-scheme.80163/
where it advises:
Update of the BIOS bootcode on a GPT scheme if the root partition is a freebsd-zfs type (root user):
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i partition-number disk-name
Which for me should translate to:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

Which seems to make sense - is it OK.
And of course thanks again.
 
Magic - thanks very much.
Many years ago I wasted more than a day recovering from some boot problem or other - so I appreciated the help.
 
Covacat: Last question, I hope. When should this bootcode command be used?
Various documents seem to advise "when you have a major upgrade"; so I presume that means one where there is a change to openzfs versions. Or would you advise on all upgrades before you shut down?
Again, many thanks.
 
In theory after zfs code upgrade and after you upgrade the zfs pool. It's not a 'major' problem as long as you can boot from an external device and have console access
 
Back
Top