Solved Secondary GPT header not in the last LBA

jbo@

Developer
Let there be a bhyve VM. The VM was doing great. One day during a cold pre-winter night it ran out of disk space (the VM, not the host). I shut down the VM, resized the virtual disk and wanted to boot the VM again but it fails to do so (can't boot from the HDD image for some reason).

I booted up the live environment on the VM to see what's up:
144104475-9ba7d552-93e8-4dc4-8cf7-43cd249ae9f0.png


As I the VM was using ZFS I tried to import the pool:
144104537-670631ee-6396-4811-aff5-9c9a89639809.png


Then I tried to recover the disk:
144107565-475c9e4d-29ab-46f9-afb2-6a83a3f4a25f.png


I'd like to understand what's going on here and how to fix it.
Presumably running out of disk space caused the GPT header to be overwritten at the end of the disk?
 
the error in thread title is a consequence of enlarging the disk
i presume the secondary gpt header is outside the last partition so it can't be overwritten by the fs filling up
 
I've handful of VMs running ZFS with GPT disk where I did expand the disk, never had any issue booting it (I'm not using bhyve but that should not matter).
GPT disk has two GPT headers. The first one, LBA 1 has an entry to the second one (sizeofdisk-1). As you've expanded the disk this information is stale. Once booted you'd issue gpart recover vtbd0 to fix these entries.
You ran to some other issue too. Hard to say what happened. What was system reporting when you attempted to boot (what stage of boot did it fail) ?
 
I've handful of VMs running ZFS with GPT disk where I did expand the disk, never had any issue booting it (I'm not using bhyve but that should not matter).
Same here - first time for everything I guess :D

You ran to some other issue too. Hard to say what happened. What was system reporting when you attempted to boot (what stage of boot did it fail) ?
When the VM is turned on, I get this (via VNC):
Code:
BdsDxe: failed to load Boot0001 "UEFI Misc Device" from PciRoot(0x0)/Pci(0x4,0x0): Not Found

I did already encounter this issue while setting up the VM initially.
For reference: https://github.com/cbsd/cbsd/issues/669
 
Yeah, I guess. :)

Can you show the gpart show vtbd0 ? It seems like uefi had problems finding the loader even. Maybe some weird things happened when system got full. If you could reproduce this we would compare the GPT header (or even few raw LBAs from that disk) before and after the issue.
 
Can you show the gpart show vtbd0 ?
sorry - I could have included that in my original post. I didn't because (at least to me) it looks the way I expected it: 128G trailing space:

1638308994488.png


Maybe some weird things happened when system got full. If you could reproduce this we would compare the GPT header (or even few raw LBAs from that disk) before and after the issue.
The setup procedure for trying to reproduce would be as follows:
  1. Setup bhyve VM using FreeBSD 13.0-STABLE snapshot from 2021-11-28
  2. Get poudriere up & running in the VM
  3. Ensuring that the disk size is too small to fit all the data that poudriere fetches & produces
  4. Wait for the VM to run out of disk space
  5. Shut down the VM
  6. Resize the disk
  7. Boot the VM
  8. profit
I'm not sure how viable this will be tho as I ran into the exact same issue when I first setup the VM (see github issue linked above).

I guess I could share the VM disk image if that is of any help. It's a 128GB image tho. The only sensitive data on there would be the private key that poudriere uses for package signing. As I'm not yet using this poudriere instance I guess there's not much harm done.
Would of course still prefer to be able to fix this without doing that :p
 
I personally can't test it as I don't have any physical servers free where I can run bhyve (all my physical servers are using VirtualBox ; can't mix with vmm). Maybe I could nest it.

Issue is interesting though as you had problems getting to bootloader. Even hypothetically if ZFS went crazy in its partition boundaries it would not affect anything else. Hence why I mentioned the dump of few LBAs from the start of the disk - it would be interesting to see and compare.

EDIT: the only stable images of 13 I see are from Nov 25, here. Where did you get those from Nov 28?
 
EDIT: the only stable images of 13 I see are from Nov 25, here. Where did you get those from Nov 28?
That was a typo on my side - sorry. The snapshot I used was from 2021-11-18, not 2021-11-28.

The author of sysutils/cbsd recommended to run this on the host:
Code:
gpart commit zvol/zroot/ROOT/default/poudriere1/dsk1.vhd

After that, the VM booted successfully and I was able to resize the partition & expand the ZFS pool successfully. The VM survived several reboots since then.

I'm currently awaiting some information regarding the reasoning behind this gpart commit "idea" as I'd like to understand what was going on.
For people following along: https://github.com/cbsd/cbsd/issues/669#issuecomment-983139311
 
Back
Top