Solved Encrypted & Mirrored ZFS vdevs Stopped Booting After Resilvering

I had a setup with 4x4 TB disks, with two mirrors consisting of two disks each, giving me about 8 TB of usable storage space. One mirror consists of ada0 & ada2, the other mirror of ada1 & ada3.

I needed to upgrade storage space, so I decided to upgrade one mirror by resilvering two times. I removed ada2, replaced it with a bigger one, booted the system, and let the resilvering do its job while still being able to use the storage in a degraded state. Everything went great so far.

Then I wanted to do the same, but for ada0. At first I discovered that the hardware is so old that it just boots from the C: drive. Hot-Swapping is not possible on this machine. So I downloaded the FreeBSD-13.0-RELEASE-amd64-bootonly.iso image, booted from the network, attached the encrypted partitions, manually imported the pool, and started to do a resilver from the rescue system, which in turn completely finished, only to discover that I’m no longer able to boot the system.

This is currently where I’m stuck at. Apparently no kernel can be found, but I fail to understand why. This is how the boot process currently looks like:

Code:
Attempting Boot From Hard Drive (C:)
BIOS drive C: is disk0
BIOS drive D: is disk1
BIOS drive E: is disk2
BIOS drive F: is disk3

GELI Passphrase for disk3p3:

Calculating GELI Decryption Key for disk3p3: …
-
Can’t find /boot/zfsloader

Can’t find /boot/loader

Can’t find /boot/kernel/kernel

FreeBSD/x86 boot
Default: disk-1:/boot/kernel/kernel
boot: _

From what I understand the boot process is either looking at the wrong place, or the loader / kernel are missing from the right place (which really might be the same configuration issue).

Both ada0 & ada2 consist of two partitions, the first one being boot, the second one data. This is how I initialize a new disk in the pool:

gpart create -s gpt ada0
gpart add -a 4k -s 512K -t freebsd-boot -l gptboot2 ada0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
gpart add -a 1m -t freebsd-zfs -l zfs2 ada0


I get a confirmation that the bootcode & partcode were both successfully installed. Just to be sure I booted a second time into the rescue system and recreated both bootcode & partcode for ada0 & ada2 with the gpart bootcode command from above, but with different parameters for each disk.

I should also add that initially each disk contained three partitions, with a swap partition in between (ada?p1 = boot, ada?p2 = swap, ada?p3 = data). I decided that it is a little bit excessive to have 4 swap partitions with 32 GB each, so I left them out on the newer disks. I imagine that the slightly changed partition layout could also be the cause of the issue, but I’m not sure.

When I put the old 4 TB drive into its bay, everything is back to normal again. At this point I wonder what I’ve missed? The partitions are there, the bootcode & partcode were installed, and the ZFS resilvering finished. There must be something else I need to do, right?

P. S. I’m sorry that I couldn’t paste the command output into my post, which would’ve made things clearer, as I’m currently unable to copy & paste from the rescue system.
 
the zfs/zpool cachefile is/might be wrong then
try to recreate it (/boot/zfs/zpool.cache ) see zpoolprops(8)
or split the freebsd-boot partitions in 2
 
Then I wanted to do the same, but for ada0. At first I discovered that the hardware is so old that it just boots from the C:
If the BIOS is old enough, there may be a physical limit of the 4TB.
I'm not sure if that's the case here, but could be.
 
Thanks guys for helping me out.
If the BIOS is old enough, there may be a physical limit of the 4TB.
I'm not sure if that's the case here, but could be.
I think there was a 2 TB limit with 32-bit systems due to limited address space. I’m not aware of a 4 TB limit. The disks are recognized, and the boot partition is always the first one on every disk, so I think this should be OK.

the zfs/zpool cachefile is/might be wrong then
try to recreate it (/boot/zfs/zpool.cache ) see zpoolprops(8)
or split the freebsd-boot partitions in 2
While I try to do it something interesting happens when I execute zpool import -R /tmp/mnt zroot inside the rescue system. There are some folders missing, others are present: When I do a zfs list I get the following output (excerpt):

Code:
zroot /tmp/mnt/zroot
zroot/ROOT none
zroot/ROOT/default /tmp/mnt
…
zroot/usr/home /tmp/mnt/usr/home

As an example folders in the root are missing (like /srv), but there are user folders in /tmp/mnt/usr/home.

The used space looks alright everywhere though.

The other interesting thing I observed is that a geli attach /dev/ada?p? will give me errors for ada1p1 (boot on older disk), and ada3p1 & ada3p2 (boot & swap on the other older disk), but it won’t for the boot partitions on ada0 & ada2. I might have screwed something up here?

I guess I’m screwed now and should hope that I have a working backup, or just put in the old ada0 disk in its bay?
 
I faced these messages yesterday:

Can’t find /boot/zfsloader

Can’t find /boot/loader

Can’t find /boot/kernel/kernel

When it boots and presents the menu of options, I chose 6. kernel and then chose the old kernel and that helped me get into the system. After that I used bectl to create a new boot environment and updated the system to 13.0 patch 10. But, for some reason, I can only get into the new environment using the root account. I am stuck there right now.
 
I put the old C: disk into its bay, and I’m now doing a second full backup.

The funny, or rather scary, thing is that while the system boots from this drive, at the same time exactly the same drive is degraded. This is most probably related to my second resilvering, but looks pretty terrible. I think I have a GELI misconfiguration that I need to sort out by myself first.
 
It turns out that some dumb tired idiot entered some stupid commands into the computer in the evening after a day full of meetings.

After checking the state of the system again, I discovered some rather bad things. Two freebsd-boot partitions out of four were accidentally encrypted, I had to fix that with geli kill.

During boot some partitions weren’t decrypted from the GELI prompt. It turned out that the partitions weren’t initialized with the required flags. geli configure -d -b -g fixed this for me and also added back the pretty asterisks (I read the geli() man page, but I still don’t get what the difference is between -b & -g, and when they are / one of them is needed, and when they / one of them can be left out?)

Just to be extra safe I also recreated the boot code on each of the four devices with gpart bootcode.

As I had temporarily put the old drive into the first bay, after installing the new one again ZFS automatically detected that something is off and started another resilvering.

In the last step I just had to expand the ZFS pool as I have the auto expand flag turned off.
 
Back
Top