Solved Invalid partition table error

Hi all,

It seems that I've successfully managed to mess up the startup process of my system. I'm in no way an expert on FreeBSD (just a regular user of it for the past 20 years or so, but was not too adventurous), so please bear with me if I'm using some technical terms incorrectly.

The chronology of events (some of them maybe irrelevant, but for the sake of providing the whole picture I'll just write what I remember) is as follows: I was running 12.2-RELEASE and decided to upgrade to 13.0-RELEASE. Tried to accomplish that through a binary update, but that didn't work. I then downloaded the source tree and rebuilt / reinstalled the world / kernel. That went fine and I was able to boot into 13.0-RELEASE.

The next step for adventurous me was to upgrade ZFS to the latest version (5000). After upgrading the pools I shut down the system. I guess in this step something went wrong or I overlooked a step because after that I couldn't boot into the system. The only thing I could see is the boot prompt with a single strange character right after it (kind of smiley, don't have a screenshot of that unfortunately).

I tried to boot from a live CD and after reading some posts on this forum I executed a command similar to the one below from the live USB stick (probably by reading too fast through this forum) to try and restore the boot loader:

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0


To make long story short I can now see the following screen when I try to boot:

20211220_202759_HDR.jpg


When booting from a USB-stick (option LiveCD) I can see the following:

20211220_203305_HDR.jpg


What I remember from disk configuration is that there was a bootpool with the size of 2 GB, swap pool with 2 GB and the remaining space was taken by the regular ZFS zroot pool. I don't remember if it was MBR of GPT.

The disk was ZFS + geli, where the prompt to provide the geli password appeared towards the middle of the boot process (just before mounting zroot pool).

So now several questions emerge:
  1. How do I get to the situation where I can see the original layout of the disk (with 2Gb bootpool, 2Gb swap space end the remaining space is zroot pool)?
  2. If point 1 is successful, how do I boot into the original system?
Basically I'm lost now, so any help is appreciated. And of course if additional details are needed, just let me know.

TIA.
 
This will be tricky to restore. You can boot using LiveCD then create in-memory filesystem and install sysutils/testdisk or remove the hard disk from the computer and attach it to another computer where you have testdisk installed. Then scan the disk and rebuild your MBR information with it.



To restore your disklabel slices you can use:

Other option is to use some another LiveCD that have partition editor which recognize ZFS
 
This will be tricky to restore the partition information. You can boot using LiveCD then create in-memory filesystem and install sysutils/testdisk or remove the hard disk from the computer and attach it to another computer where you have testdisk installed. Then scan the disk and rebuild your MBR information with it.



Other option is to use some another LiveCD that have partition editor which recognize ZFS

Thanks for the excellent guide on LiveCD. I managed to get through all the steps and start testdisk, however the utility only sees the USB drive from which the liveCD was started and not the HDD. As shown above, gpart show recognizes the HDD.

You also mention some other LiveCD that have partition editor which recozniges ZFS - do you have an example of such editor?
 
MBR partition info seems ok (gpart bootcode doesn't overwrite full 512B on sector 0), judging also from the output you provided. gpart shows ada0 with one legacy FreeBSD partition. Fixing bootcode there is not a problem. Problem is, as mentioned above, you overwrote first 130kB or so of this freebsd partition with partition bootcode. On top of it you said it's a crypto partition. This can be a real problem.

Did this disk have only one crypto partition? Did you try to geli attach when booted life version (not likely as you don't see ada0s1X there). Aiming for a hope that the partition was maybe noped enough the actual data was not overwritten.
 

Is it always s not p when it's … MBR? (I get confused.)



With a working machine, UEFI boot:

Code:
% geom part show ada0
=>        40  1953525088  ada0  GPT  (932G)
          40      532480     1  efi  (260M)
      532520        2008        - free -  (1.0M)
      534528    33554432     2  freebsd-swap  (16G)
    34088960  1919434752     3  freebsd-zfs  (915G)
  1953523712        1416        - free -  (708K)

% lsblk ada0
DEVICE         MAJ:MIN SIZE TYPE                                          LABEL MOUNT
ada0             0:123 932G GPT                                               - -
  ada0p1         0:125 260M efi                                    gpt/efiboot0 -
  <FREE>         -:-   1.0M -                                                 - -
  ada0p2         0:127  16G freebsd-swap                              gpt/swap0 SWAP
  ada0p2.eli     2:44   16G freebsd-swap                                      - SWAP
  ada0p3         0:129 915G freebsd-zfs                                gpt/zfs0 <ZFS>
  ada0p3.eli     0:137 915G -                                                 - -
  <FREE>         -:-   708K -                                                 - -
%
 
the fdisk/mbr partititions were called slices in BSD speak
hence the 's'
the a-n inside the s were called partitions
wd0s3e
wd device name
0 first wd device
s3 third fdisk partition/slice
e bsd (disklabel) partition inside the slice
you can have a bsdlabel on the raw device and then you have
wd0e
 
_martin:

As far as I remember the disk has only one crypto partition. I tried to do geli attach /dev/ada0s1 but get:
Code:
geli: Cannot read metadata from /dev/ada0s1: Invalid argument.
geli: There was an error with at least one provider.
 
While i testing your scenario i think i found a bug in 13.0-RELEASE installer. When i select ZFS on MBR(BIOS) with GELI after the installation and reboot it overwrite sector 1 where the bsdlabel store it's information (sector 64). But i can still mount the bootpool and see my geli key using zpool import with alt root via LiveCD. The same test in 12.3-RELEASE works fine. The difference that i see in between 12.3-RELEASE and 13.0-RELEASE is that the bootpool in 12.3-RELEASE is pointing to da0s1a but in 13.0-RELEASE the bootpool is pointing at da0s1 which cause the overwrite of the bsdlabel information.
1640429393055.png


1640429081973.png


After reboot
Missing Operating system.

View from LiveCD
1640429177325.png


Bootpool from both 12.3-RELEASE and 13.0-RELEASE
1640430185392.png
1640430228970.png
 
VladiBG I can see the following after trying zpool import (ran from LiveCD):

20211225-screenshot.jpg


Which at least shows that the bootpool is recognized. So what should be the next step - importing it?
 
Is it always s not p when it's … MBR? (I get confused.)
p stands for partition and it's used in GPT scheme. MBR scheme (the same name as for the code in lba 0 - mbr (master boot record)) uses slices under FreeBSD terminology. Those slices are then subdivided into partitions (as per covacat's example above).

Back to the OPs problem though. We can safely say we had only one slice, ada0s1. Problem is how was this subdivided.

potzilov Few questions to clarify the situation:

o) Did you install it yourself or was the OS installed by installer ? What version was the installer (assuming 12.2) ? Did you interact with the installer or did you leave all on default? This way we can attempt to guess what were the partitions inside that slice (one big slice for ZFS or some others).
o) Was this a root on ZFS ? Or was the zfs pool only for data.
o) Were you using the key and passphrase to decode the partition? If you used the key where was it stored ?

As I mentioned in my first post -- we need to count on the luck that the geli partition starts further than ~140kB from the start of the slice. Maybe that testdisk is able to locate the metadata and recreate the bsdlabel that way. If it doesn't I'd say you can't recover it as this is a fully crypto partition. I call myself crypto-stupid, I don't know much about crypto stuff. But I'm assuming that even if you have a key you can't start decoding anywhere as the n+1 block depends on the result of the n-block (you lost the beginning you lost it all).
 
2Gb bootpool, 2Gb swap space end the remaining space is zroot pool
If you are sure that this was your layout you can try to recreate it. If it's not exact you will destroy your data so it's up to you.

The following command may destroy your data:

gpart bootcode -b "/boot/mbr" ada0
gpart set -a active -i 1 "ada0"
gpart create -s BSD ada0s1
gpart add -i 1 -t freebsd-zfs -s 2147483648b "ada0s1"
gpart add -a 4k -i 2 -t freebsd-swap -s 2147483648b "ada0s1"
gpart add -a 4k -i 4 -t freebsd-zfs "ada0s1"

dd if="/boot/zfsboot" of="/dev/ada0s1" count=1
dd if="/boot/zfsboot" of="/dev/ada0s1a" skip=1 seek=1024
 
Before you start modifying the disk I suggest you do a copy of the disk if you value the data.

Can you share the dump of the current status of the partition ? Please run this command under live cd: hd -n 159232 /dev/ada0s1 | gzip |base64 and share the output. This will dump the data of the slice. I'm assuming 13.0 /boot/gptzfsboot which is 158858B on my system, rounded up.
 
VladiBG thanks for the suggestion. I'll keep these commands in mind as a "last resort" solution. I think the suggestion of _martin to make a copy of the disk before recreating the structure is the first step to take in such a scenario.
 
_martin to answer the questions:
  • Did you install it yourself or was the OS installed by installer ? What version was the installer (assuming 12.2) ? Did you interact with the installer or did you leave all on default? This way we can attempt to guess what were the partitions inside that slice (one big slice for ZFS or some others).
    The version of the installer was way before 12.2. I think I installed this machine first with FreeBSD 9. After that upgrades were done. As far as I can remember I used the defaults provided (except encrypting swap partition - turned that on).
  • Was this a root on ZFS ? Or was the zfs pool only for data. It was root on ZFS.
  • Were you using the key and passphrase to decode the partition? If you used the key where was it stored ? I think it was only a passphrase as I don't remember using any arguments besides passphrase.
 
the GELI metadata is on the last sector of it's provider
in your case that might have been ada0s1d
if you dd the last sectors of the disk you will find the metadata and can find out the provider size
then you can create a label that will have a partition starting at the desired point and can geli attach it
 

Attachments

  • Screenshot 2021-12-25 at 15.45.32.png
    Screenshot 2021-12-25 at 15.45.32.png
    11.9 KB · Views: 89
Before you start modifying the disk I suggest you do a copy of the disk if you value the data.

Can you share the dump of the current status of the partition ? Please run this command under live cd: hd -n 159232 /dev/ada0s1 | gzip |base64 and share the output. This will dump the data of the slice. I'm assuming 13.0 /boot/gptzfsboot which is 158858B on my system, rounded up.

Size is 348421B (see attached the output)
 

Attachments

  • hd.txt
    340.3 KB · Views: 100
So FreeBSD 9 installer is what created that layout. Is it i386 or amd64 ? As both covacat and VladiBG mentioned it's reasonable to believe that ada0s1d was your crypto volume with the layout VladiBG provided.

Yes, you can import it with zpool import -R /mnt bootpool. Use -f if needed. What is in that boot pool ? I'm hoping it will give us more information about the disk layout and maybe there's a key that geli uses (if you used the key, as you don't know 100%).

Asking for the whole contents of the bootloader in that area was maybe overkill from my side but ok. It would be interesting to seek after this mark and check the contents there. But first let's see what that pool is saying.
 
afaik the geli keys or on last sectors on the encrypted block device (disk,partition) and they are decrypted at run time with the passphprase
so if you dump the metadata, you can recover the size of the geli provider and by simple arithmetic deduce its start
then create a label and attach it
look in dmesg for the number of sectors of ada0
subtract 32 of that number and let the result be N
dd if=/dev/ada0 iseek=N of=somefile.out bs=4k
hexdump -Cv somefile.out and look for the geli magic string GEOM::ELI
at offset 30 you have the size of the provider (8 bytes little endian) in bytes
convert it to sectors
subtract it from ada0s1 size and you get its original offset, so you can recreate the label
 
Key is a file that needs to be available during decryption. It's in a separate file. You maybe talking about the Master Keys (hence without a key, only passphrase).

But the idea you presented for the finding of the size of the partition is a good one. We know the slice size so we could check it with this: dd if=/dev/ada0s1 bs=512 skip=976773095 | hd, look for the GEOM::ELI start and offset of 0x1e. I'm assuming it's 8B for i386 arch too if this is a case.
With qemu-nbd() you can actually mount it as a device with an offset (i.e. create dsf /device special file or device/ that you can use geli attach on).
 
Since my posts are moderated for now (new forum user) your suggestions are coming faster than my posts are being approved. I've attached the screenshot of ls of the mounted bootpool few posts above, so you can see now what's in that pool.
 
your suggestions are coming faster than my posts are being approved.
I started questioning myself when I saw those posts reappeared after other posts. :)

Anyway, that /boot/encryption.key seems suspicious. What is the size of that file (64)? Don't share the key but do make a copy of it somewhere else. It's very likely that key is what is needed for decrypting the rpool.

If this is a case you only lost the bsdlabel, all the data is intact. We don't know if swap was first or bootpool was first. Historically it made sense to put swap as soon as possible due to the way how hdd works. But that's just speculation.

Can you share the zpool status output ? Next step I'd do is to determine the actual offset (lba) of this bootpool partition from the slice. Either the covacat's method (I like that) or by matching it with by trial-error approach. dump first 1024 bytes from the bootpool device: hd -n 1024 /dev/$disk_part > start_of_bootpool) and try to locate it with dd. It will start soon after slice partition start or after +2G from start.

These types of problems are easier to troubleshoot (at least for me) first hand or interactively. Let's see first if the key is what we think it is.
 
i think you can detect zpools start points by writing a simple tool that uses
zpool_read_label in libzutil
did not test but looks doable
open ada0s1 and seek a page at a time and run zpool_read_label
see if succeeds and print the current position
but again the first 2 pools are not that important
(obviously it wont work to detect the geli encrypted pool)
 
Back
Top