Solved ZFS MOS Boot Error after Geli OS Drives

I have a server that I Installed FreeBSD 12.1 in a mirror configuration.

Code:
  pool: zroot
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    zroot       ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        da0p3   ONLINE       0     0     0
        da1p3   ONLINE       0     0     0

After installing I wanted to geli these drives as a test and be prompted with a password at boot time. These are the commands I ran to accomplish the geli setup.

zpool offline zroot da0p3
geli init -b -d -g -s 4096 /dev/da0p3
geli attach /dev/da0p3
zpool replace zroot 3908907440691126591 /dev/da0p3.eli

Let the server resilver.

zpool offline zroot da1p3
geli init -b -d -g -s 4096 /dev/da1p3
geli attach /dev/da1p3
zpool replace zroot 13242873678876809506 /dev/da1p3.eli

Stamped the boot code.

gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1

zpool looked happy.

Code:
  pool: zroot
 state: ONLINE
  scan: resilvered 479M in 0 days 00:00:03 with 0 errors on Thu Jul 23 20:14:49 2020
config:

    NAME           STATE     READ WRITE CKSUM
    zroot          ONLINE       0     0     0
      mirror-0     ONLINE       0     0     0
        da0p3.eli  ONLINE       0     0     0
        da1p3.eli  ONLINE       0     0     0

geli looked good too and zdb showed the geli disks path names. Put geli in boot loader too.

Code:
geli list
Geom name: da0p3.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 128
Crypto: software
Version: 7
UsedKey: 0
Flags: BOOT, GELIBOOT, GELIDISPLAYPASS
KeysAllocated: 373
KeysTotal: 373
Providers:
1. Name: da0p3.eli
   Mediasize: 1598172426240 (1.5T)
   Sectorsize: 4096
   Mode: r1w1e1
Consumers:
1. Name: da0p3
   Mediasize: 1598172430336 (1.5T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1

Geom name: da1p3.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 128
Crypto: software
Version: 7
UsedKey: 0
Flags: BOOT, GELIBOOT, GELIDISPLAYPASS
KeysAllocated: 373
KeysTotal: 373
Providers:
1. Name: da1p3.eli
   Mediasize: 1598172426240 (1.5T)
   Sectorsize: 4096
   Mode: r1w1e1
Consumers:
1. Name: da1p3
   Mediasize: 1598172430336 (1.5T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1

Rebooted the server and get stuck here at boot time.

Screen Shot 2020-07-24 at 8.20.06 AM.png


Booted into a live CD and attached geli to the da drives. zpool import works and the pool is healthy with the geli disks. I mounted it with altroot and can see the data in /tmp/mypool/. Exported the pool and rebooted and server is still stuck on the image above.

Did I miss a step?
 
Adding 2 more screenshots.

First is booting the FreeBSD 12.1 ISO so I can get into a live CD. The ISO boot loader sees my geli disks and I enter in the password to continue.

Screen Shot 2020-07-24 at 9.24.41 AM.png


2nd is gpart show of the 2 disks in the OS mirror, screenshot taken from the live CD.

Screen Shot 2020-07-24 at 9.28.05 AM.png
 
I booted the server into the live CD and ran some zdb commands. I didn't attach any geli disks to the providers or run any other commands.


This is the zdb -l da0
------------------------------------
LABEL 0
------------------------------------
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
failed to unpack label 1
------------------------------------
LABEL 2
------------------------------------
version: 5000
name: 'zroot'
state: 0
txg: 475
pool_guid: 18258763919932459508
hostid: 3928337989
hostname: 'xxx'
top_guid: 3233035683819686845
guid: 3908907440691126591
vdev_children: 1
vdev_tree:
type: 'mirror'
id: 0
guid: 3233035683819686845
metaslab_array: 68
metaslab_shift: 33
ashift: 12
asize: 1598167711744
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 3908907440691126591
path: '/dev/da0p3'
whole_disk: 1
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 13242873678876809506
path: '/dev/da1p3'
whole_disk: 1
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
------------------------------------
LABEL 3
------------------------------------
failed to unpack label 3

This is the zdb -l da0p3

------------------------------------
LABEL 0
------------------------------------
failed to unpack label 0
------------------------------------
LABEL 1
------------------------------------
failed to unpack label 1
------------------------------------
LABEL 2
------------------------------------
failed to unpack label 2
------------------------------------
LABEL 3
------------------------------------
version: 5000
name: 'zroot'
state: 0
txg: 475
pool_guid: 18258763919932459508
hostid: 3928337989
hostname: 'xxx'
top_guid: 3233035683819686845
guid: 3908907440691126591
vdev_children: 1
vdev_tree:
type: 'mirror'
id: 0
guid: 3233035683819686845
metaslab_array: 68
metaslab_shift: 33
ashift: 12
asize: 1598167711744
is_log: 0
create_txg: 4
children[0]:
type: 'disk'
id: 0
guid: 3908907440691126591
path: '/dev/da0p3'
whole_disk: 1
create_txg: 4
children[1]:
type: 'disk'
id: 1
guid: 13242873678876809506
path: '/dev/da1p3'
whole_disk: 1
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
 
I was able to fix the boot error by getting into a live CD and running the following commands.

There is old ZFS metadata referencing my old disks and trying to boot them before the geli disks.

Code:
geli attach da0p3 (enter password for the disk)
geli attach da1p3 (enter password for the disk)
zpool import -a -f -o altroot=/tmp/mypool
zpool status (make sure mirror looks healthy)
zpool offline zroot da0p3.eli
zpool detach zroot 11018227870204114179
zpool labelclear -f /dev/da0p3.eli
geli backup /dev/da0p3 /tmp/da0back
geli detach da0p3
zpool labelclear -f /dev/da0p3
geli restore -fv da0back /dev/da0p3
geli attach da0p3
zpool attach zroot da1p3.eli da0p3.eli
zpool status (make sure mirror looks healthy)

Repeat commands for the da1 disk, starting with zpool offline.

zpool export zroot
reboot

Before the reboot you can run zdb -l /dev/da0 and see if what the labels look like but they should be gone.

The server booted without issue after this and zpool looks good.

This thread can be closed.
 
Back
Top