Solved broken ZFS root

cbrace

Well-Known Member

Reaction score: 15
Messages: 316

Hi all,

Several days ago, I used freebsd-update for what I thought was a routine minor upgrade to my VPS running v13.0 to bring it to p1.

Something broke. The VPS would not boot:

ZFS: out of temporary buffer space

A quick search brought me to this page:

FreeBSD 13.0 upgrade zfs: out of temporary buffer space
It would appear that as part of the zpool upgrade that the boot code is not updated. There may have been a message about this but I didn't see it. I downloaded the FreeBSD 13-RELEASE mem stick image to a USB stick and booted that. You can repair the boot code with the following command

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i<gpart index of freebsd-boot> <block device>

I used the emergency boot facility at the hosting company, Transip, to boot and issue that gpart command. It seems to have worked, but when I reboot I end up at the FreeBSD boot prompt.

Someone on Reddit suggesting booting again from the emergency environment and issuing these commands:
  1. zpool import -N zroot
  2. zpool set bootfs=zroot/ROOT/13.0-RELEASE zroot
But this doesn't work because Transip's emergency FreeBSD boot is based on a very old version of FreeBSD, 10.3, and doesn't support all the features of the current version of ZFS. Screen cap:

Screenshot at 2021-05-30 13-55-53.png


The disk on my VPS is vtbd0.

The ZFS root partition is: zroot/ROOT/13.0-RELEASE

How do I get my VPS to boot again? As I said above, I can't get any further than the boot prompt right now.

Thank you
 

_martin

Daemon

Reaction score: 308
Messages: 1,119

Right now I've only one single VM on 13 so my experience with 13 is limited. It seems your google search was good ; just out of curiosity I found this rant while I was looking for possible issues with the error message.
Judging from that rant it seems you found the root cause of your problem - not updated bootloader. Also from that rant it seems this is a culprit of a legacy boot (but I'd not put wager on it).

Note on FreeBSD 13 you have OpenZFS implementation of ZFS (as opposed to a one stemming from OpenSolaris). So you better use rescue system based on 13 and higher.
 

covacat

Well-Known Member

Reaction score: 227
Messages: 475

are you sure you installed the 13.x bootcode ?
if you copied the line as is you most likely installed the 10.x stuff
 
OP
cbrace

cbrace

Well-Known Member

Reaction score: 15
Messages: 316

are you sure you installed the 13.x bootcode ?
if you copied the line as is you most likely installed the 10.x stuff
What I did was the following:

About a month ago I ordered this VPS and installed the latest version of FreeBSD that the hosting company had available: v12.2.

After installing that, I upgraded it to v13.0-RELEASE using freebsd-update. it seemed to be running fine.

A few days ago, I used freebsd-update again to install p1, which was just released.

Upon rebooting, I got the error message: ZFS: out of temporary buffer space, and the system would not boot.

if I understand you correctly, running gpart bootcode [...] from the emergency console would have installed the 10.x bootcode. Obviously that is not going to help.

Last night I mailed Transip support and pointed out that it would be really helpful if the emergency console ran a more recent version that more fully supported ZFS. They said they would get back to me.

If they make v12.2 or ideally v13 available, should running the gpart bootcode command from the emergency console then make my VPS bootable again?
 
OP
cbrace

cbrace

Well-Known Member

Reaction score: 15
Messages: 316

Right now I've only one single VM on 13 so my experience with 13 is limited. It seems your google search was good ; just out of curiosity I found this rant while I was looking for possible issues with the error message.
Judging from that rant it seems you found the root cause of your problem - not updated bootloader. Also from that rant it seems this is a culprit of a legacy boot (but I'd not put wager on it).

Thanks for sharing that link. This:
So this sounds like a missing step in the automated upgrade flow. Normally, using new features in zpool is deferred until you choose to upgrade the pool after the reboot, so you get to see the warnings. At a guess (because I'm on FreeBSD 11.4 still), the OpenZFS migration forces the issue to do the zpool upgrade early and they missed the gpart requirement.

This seems to have what screwed me. At some point in the v12.2 --> v13.0 update path, the root zpool must have gotten silently upgraded.

My previous experience has been that you have to update ZFS pools manually. As you probably know, if you run zpool status, it tells you if a pool can be upgraded, and if you do so it advises you to run the gpart command aftewards.

Note on FreeBSD 13 you have OpenZFS implementation of ZFS (as opposed to a one stemming from OpenSolaris). So you better use rescue system based on 13 and higher.

Thanks for the heads-up. I will mention this to the hosting support.
 
OP
cbrace

cbrace

Well-Known Member

Reaction score: 15
Messages: 316

A support person from Transip replies:
Onze Rescue image updaten lijkt me inderdaad een goed plan als de verschillende versies zo incompatibel zijn. Jammer genoeg gaat dat op korte termijn lastig worden.
auto-trans:
Updating our Rescue image seems like a good plan indeed when the different versions are so incompatible. Unfortunately, that's going to be difficult in the short term.
Unless someone here has any brilliant ideas, I guess I'm screwed :( :( :(
 

covacat

Well-Known Member

Reaction score: 227
Messages: 475

A support person from Transip replies:

auto-trans:

I guess I'm screwed :( :( :(
boot from the current rescue disk
bring ifaces up
scp a current/13 zfsbootcode file
install that
if you cant bring interfaces up for whatever reason but still have a full live fs just cat a uunecoded file into the console and then uudecode it
 
OP
cbrace

cbrace

Well-Known Member

Reaction score: 15
Messages: 316

One of the things I find so bizarre about this situation is that I have a second VPS at a different company that is also running FreeBSD. It's a small VPS that I keep mainly as a backup mailserver. When I see that an OS update is available, I run freebsd-update on it first, as a safety precaution.

I did this time as well. I updated it to v13 about a month ago. A few days ago I updated it to p1. No problems. So I then went to upgrade my main server, which failed. What was the difference? I have no idea.
 

covacat

Well-Known Member

Reaction score: 227
Messages: 475

i dont know if freebsd-update tries to update the bootcode. clearly there are cases when it cant be done (like if you are pxebooting) booting from another disk that your root is mounted from,etc,etc
your other vps may have a 12.2 timeframe bootcode which may still work and 12.1 might not
 
OP
cbrace

cbrace

Well-Known Member

Reaction score: 15
Messages: 316

boot from the current rescue disk
bring ifaces up
scp a current/13 zfsbootcode file
install that
if you cant bring interfaces up for whatever reason but still have a full live fs just cat a uunecoded file into the console and then uudecode it
Yes! OK, I just copied the gptzfsboot file from my backup VPS on a different host, which is also running v13-RELEASE-p1. I now see this file in the home directory of the rescue console. Is this the right file? I don't see a file called zfsbootcode in /boot on my other VPS

So how exactly do I copy it to the the /boot dir of my VPS?

gpart list shows the disk, vtbd0

vtbd0p1 is labelled gptboot0

so presumably /boot is there. But this device is not mounted, right?
 

covacat

Well-Known Member

Reaction score: 227
Messages: 475

gpart bootcode -p /root/Downloads/gptzfsboot -i<gpart index of freebsd-boot> <block device>
with that just use the correct path from gptzfsboot
or just dd if=/root/Downloads/gptzfsboot of=/dev/vtbd0p1 if you are brave :)
 
OP
cbrace

cbrace

Well-Known Member

Reaction score: 15
Messages: 316

It worked! Thank you so much covacat !! I really appreciate your help.👍👍👍
 
Top