ZFS Failure to boot after power interruption

I have a problem that I seem to be unable to figure out myself. I would greatly appreciate any help that the people here could offer.

Situation​

A UPS fault caused a power interruption in my FreeBSD file server. Since this event the system has been unable to boot. It displays the following error:

Code:
>> FreeBSD EFI boot block
   Loader path: /boot/loader.efi

   Initializing modules: ZFS UFS
   Probing 10 block devices ......*...+.. done
    ZFS found the following pools: bootpool
    UFS found no partitions
ZFS: i/o error - all block copies unavailable
ZFS: can't read object set for dataset 21
ZFS: can't open root filesystem
ZFS: i/o error - all block copies unavailable
ZFS: can't read object set for dataset 21
ZFS: can't open root filesystem
Failed to load '/boot/loader.efi'
panic: No bootable partitions found!

System​

The system is configured with bootpool as a mirrored pair of SSDs - ada0 and ada1. The zroot is geli encrypted. The FreeBSD version is some years old, but I do not recall exactly what version I am on. It's a version or two back, but it's not crazy outdated.

Findings so far​

I have tried to do some troubleshooting with the command environment from the USB installer memstick image. In that environment I can import the pool and confirm that the data is there and it has a zpool status status of "ONLINE" - rather than degraded or something concerning. I can also mount the bootpool and access data with the following commands in the USB environment:

mkdir /tmp/bootpool
zpool import -f bootpool
zfs set mountpoint=/tmp/bootpool bootpool
zfs mount -a
vi /tmp/bootpool/boot/loader.conf


When I do that I have no problem reading data - like the loader.conf file. So, as far as I can tell, the bootpool isn't broken.

I tried to zpool export bootpool after importing it. But that did not have any effect.

I can import the -geli encrypted- zroot with the following commands (based on this serverfault answer):

mkdir /tmp/bootpool
zpool import -f bootpool
zfs set mountpoint=/tmp/bootpool bootpool
zfs mount -a
cp /tmp/bootpool/boot/encryption.key /tmp/
zfs umount -a
zpool export bootpool
geli attach -k /tmp/encryption.key /dev/ada0p4
geli attach -k /tmp/encryption.key /dev/ada1p4
zpool import -R /mnt zroot


The zroot also shows an "ONLINE" zpool status after doing this.

At this point I'm guessing that the error is not related to the state of the ZFS pool. And, given that everything seems to be working when I'm in the USB environment, I am not suspecting hardware issues. The "UFS found no partitions" message seems important. But I've hit a dead end. Internet searches for the error messages do not return anything relevant to my situation - and very few diagnostic suggestions. The output of gpart show ada0 and gpart show ada1 seem normal to me. But I haven't looked at those in years so I wouldn't notice if anything has changed.

Code:
# gpart show ada0
=>      40  1000215136  ada0  GPT  (477G)
        40      409600     1  efi  (200M)
    409640        2008        - free -  (1.0M)
    411648     4194304     2  freebsd-zfs  (2.0G)
   4605952     4194304     3  freebsd-swap  (2.0G)
   8800256   991414272     4  freebsd-zfs  (473G)
1000214528         648        - free -  (324K)

Does anyone have any idea what might be causing this problem? Or any steps that I can take to diagnose or resolve the issue?
 
"bootpool" , "zroot", this sounds like /boot is on "bootpool", and the rest of the system on "zroot". Please clarify.

Giving the size of the efi partition it looks like a 13.0 first install partitioning. You can get the exact system version after importing the pool and mounting the Z file system, similar to freebsd-version(1) example
Code:
           env ROOT=/mnt /mnt/bin/freebsd-version -ku

A zpool-scrub(8) after importing from the memstick wouldn't hurt also.

Probing 10 block devices ......*...+.. done
This seems to be a boot1.efi(8) message.
Code:
 % grep -r Probing /usr/src-main | grep 'block devices'
/usr/src-main/stand/efi/boot1/proto.c:    printf("   Probing %zu block devices...", nhandles);
boot1.efi(8)
Code:
On UEFI systems, boot1.efi loads /boot/loader.efi from the default root
     file system and transfers execution there.
/boot/loader.efi on ZFS can't be found due to i/o error.
ZFS: i/o error - all block copies unavailable
ZFS: can't read object set for dataset 21
ZFS: can't open root filesystem
Failed to load '/boot/loader.efi'

Check if this is a efi ESP loader issue. Try the memstick image, this will load loader.efi. Best the memstick is a supported version.

At the boot menu "Escape to loader prompt",
Code:
load /boot/kernel/kernel
load /boot/kernel/zfs.ko
set currdev=zfs:bootpool:
boot
mind the colon at end of currdev line. Make sure currdev bootpool is the dataset where "bootfs" is set. currdev may fail if it's not loader related.

You mention only
The zroot is geli encrypted.
does that mean bootpool, where bootpool/boot/encryption.key is stored, is un-encrypted?

I would also run sysutils/smartmontools, to check the status of all hard drives.
 
Thanks for the replies and great diagnostic suggestions. There's a lot to go through here, so I'll break it up into quote snippets and reply to the pieces individually.

"bootpool" , "zroot", this sounds like /boot is on "bootpool", and the rest of the system on "zroot". Please clarify.
Correct. the "bootpool" is just /boot. The "zroot" is the rest of the system (e.g. zroot/usr, zroot/var).

Giving the size of the efi partition it looks like a 13.0 first install partitioning. You can get the exact system version after importing the pool and mounting the Z file system, similar to freebsd-version(1) example
I haven't been able to get to a state where I can run that command at the moment. Unfortunately I haven't been able to mount the zroot. Even in the memstick live environment /mnt is read-only, so it just errors out. The zpool import -R /mnt zroot command just made that pool show up in zpool status so it was just so that I can see that it is alive. I tried to get it to mount in /tmp with the "set mountpoint" trick, but it didn't quite work. I could get the child partitions ("datasets"?) readable in /tmp, but the root stuff didn't seem to go there.

What is the correct way to mount this in the memstick live environment so that I can access the data? The zroot pool looks pretty much like this.

A zpool-scrub(8) after importing from the memstick wouldn't hurt also.
Just ran a scrub. All clean. Scrub repaired 0 bytes in 2 seconds. No errors.

Check if this is a efi ESP loader issue. Try the memstick image, this will load loader.efi.
It got further than the regular boot - or at least there was a lot more text printed out rapidly. I couldn't keep up with the stuff flying by, but it got to the point where it detected all of the drives (i.e. listed out the all of the ada and da drives) then failed with this:

mountroot: unable to remount devfs under /dev (error 2)
mountroot: unable to unlink /dev/dev (error 2)
[...]
init: not found in path /sbin/init:/sbin/oinit:/sbin/init.bak:/rescue/init
panic: no init
[...]
Automatic reboot in 15 seconds - press a key on the console to abort


Make sure currdev bootpool is the dataset where "bootfs" is set. currdev may fail if it's not loader related.
How can I check this? I don't recall ever setting "bootfs" myself, so it may or may not be correct. The zfs get all output doesn't show any variables with that name.

Best the memstick is a supported version.
Do you mean FreeBSD version? The memstick image is the 14.2-RELEASE image. So it is certainly much more recent then the OS on the server.

does that mean bootpool, where bootpool/boot/encryption.key is stored, is un-encrypted?
Yes.

I would also run sysutils/smartmontools, to check the status of all hard drives.
Is that installed on the memstick live environment? The smartctl command doesn't seem to be available.

Are you sure both devices are up to date/consistent?
Can you/have you stopped in the BIOS/UEFI and explicitly picked a boot device?
I did access the motherboard's boot menu and try booting off of each mirrored drive individually. Both had the same result. As for consistency, how would I check that? The gpart show command is the only thing that I've run on both drives and it showed the exact same output for both ada0 and ada1.

And related: have you done a zpool upgrade lately?

If so, have you rebooted since that upgrade? Additionally, have you also updated EFI kernel loader (EFI/BOOT/BOOTX64.efi) on both devices’ EFI partitions?
That might indeed be relevant. This machine very, very rarely receives a reboot (maybe every 2 or so years, when I do some maintenance). I have indeed upgraded the zpools at some point a while back (maybe a year or so ago?), and I don't recall if there have been any reboots since then.

I have not updated the kernel loader. The system has gone through exactly one FreeBSD version upgrade since I first set it up. But I only did the golden path upgrade. If I was supposed to manually copy things to /boot, then I have not done that. How would I go about checking if they're the correct version?
 
I have not updated the kernel loader. The system has gone through exactly one FreeBSD version upgrade since I first set it up. But I only did the golden path upgrade. If I was supposed to manually copy things to /boot, then I have not done that. How would I go about checking if they're the correct version?
You could try
Code:
diff /boot/loader.efi /boot/efi/efi/freebsd/loader.efi
diff /boot/loader.efi /boot/efi/efi/boot/bootx64.efi
Potentially helpful thread:
 
Both boot1.efi in the EFI System Partition and /boot/loader.efi in the "bootpool" zpool might be too old. You should try updating them. Using the binaries from FreeBSD 14.2 should work well.

"bootfs" is a property of ZFS pools, not of ZFS datasets, so the command to obtain the property's value is zpool get.

The command zpool history will help you remember which of your pools were upgraded and when.

When using ZFS boot environments, the root filesystem is mounted from a dataset with a name like zroot/ROOT/default which has the "canmount" property set to "noauto". This can explain why you see the contents of the other ZFS filesystems but don't see the contents of the root directory.
 
"bootfs" is a property of ZFS pools, not of ZFS datasets, so the command to obtain the property's value is zpool get.
Got it.
Code:
> zpool get bootfs
NAME      PROPERTY  VALUE               SOURCE
bootpool  bootfs    -                   default
zroot     bootfs    zroot/ROOT/default  local
So it looks like the bootfs variable is actually set on the zroot pool - not the bootpool. And it is pointing to a dataset of the zroot pool.

It sounds like T-Daemon was expecting it to be set on bootpool and pointing to bootpool? Is this misconfigured? To be clear, I have not manually set this (iirc), and it has been working for many years. Or perhaps the loader prompt commands suggested should have been ran with currdev of zroot/ROOT/default? Since it is encrypted, I assume that would require some additional steps.

The command  zpool history will help you remember which of your pools were upgraded and when.
> zpool history | grep upgrade
2023-04-19.20:40:22 zpool upgrade -a


So 2 years since I did the zpool upgrade.

That history command is new to me. What a great feature. I am impressed by ZFS every time I learn something new about it.

You could try
Code:
diff /boot/loader.efi /boot/efi/efi/freebsd/loader.efi
diff /boot/loader.efi /boot/efi/efi/boot/bootx64.efi
There was no need to diff. The file /tmp/bootpool/boot/loader.efi was dated 2023, and the file in the EFI System Partition (ESP) at EFI/BOOT/BOOTX64.EFI was dated 2016. The file sizes were also massively different. So the loader in the /boot folder was probably updated when I did a FreeBSD upgrade, and the loader in the ESP had not been updated since I first set up the system.

I did check the loader versions as described in the loader.efi(8) man page.

The loader in the system ESP:
> strings /tmp/ada0efi/EFI/BOOT/BOOTX64.EFI | grep FreeBSD | grep EFI
>> FreeBSD EFI boot block


Now that's a familiar name. That's the guy who told me that we're not going to boot up anymore. It is old enough to not even have a version number.

The loader on the bootpool:
> strings /tmp/bootpool/boot/loader.efi | grep FreeBSD | grep EFI
FreeBSD/amd64 EFI loader. Revision 1.1


And the loader in the memstick /boot:
> strings /boot/loader.efi | grep FreeBSD | grep EFI
DFreeBSD/amd64 EFI loader. Revision 3.0



boot1.efi in the EFI System Partition
There was no boot1.efi in my ESP. They only contained two files:

> ls /tmp/ada0efi/EFI/BOOT
BOOTX64.EFI STARTUP.NSH


I guess that is something that isn't always present. Or at least my system does not seem to use it.

Also, under the EFI/ directory I only had a BOOT/ folder. And I do not have an efi/freebsd/ folder that I see people mentioning in various articles and discussions. Not sure what's going on with that.

You should try updating them. Using the binaries from FreeBSD 14.2 should work well.
This was the solution.

Resolution: Updating the ESP loaders​

This project took a lot of research time to plan this out. I was unfamiliar with this process, and it's further complicated by doing the work in a USB memstick live environment. I could only find accounts of people who did this from a working system. The process is mostly based on the loader.efi(8) man page, the thread Jose linked, this thread about mounting an ESP, and this article - adapted to work in memstick mode.

The memstick image must be as new or newer than the version of the system operating system. This is required so that the new loader is a version with support for all of the features in the system OS.

Bash:
# mount bootpool somewhere I can write to in usb live environment
mkdir /tmp/bootpool
zpool import -R /tmp/bootpool bootpool    # may need -f on first run

# mount esp from both mirrors in a writeable folder
mkdir /tmp/ada0efi
mkdir /tmp/ada1efi
mount -t msdosfs /dev/ada0p1 /tmp/ada0efi
mount -t msdosfs /dev/ada1p1 /tmp/ada1efi

# backup originals
# IMPORTANT: the esp had a tiny partition size and can't hold two loader files - so backup esp files to bootpool
mv /tmp/bootpool/boot/loader.efi /tmp/bootpool/boot/loader.efi.BAK202505
mv /tmp/ada0efi/EFI/BOOT/BOOTX64.EFI /tmp/bootpool/boot/BOOTX64.EFI.ada0BAK202505
mv /tmp/ada1efi/EFI/BOOT/BOOTX64.EFI /tmp/bootpool/boot/BOOTX64.EFI.ada1BAK202505

# copy loader from memstick /boot to the bootpool and esp on both mirrors
cp /boot/loader.efi /tmp/bootpool/boot/loader.efi
cp /boot/loader.efi /tmp/ada0efi/EFI/BOOT/BOOTX64.EFI
cp /boot/loader.efi /tmp/ada1efi/EFI/BOOT/BOOTX64.EFI

# unmount bootpool and esp - probably not needed, but just to be safe
zfs umount bootpool
umount /tmp/ada0efi
umount /tmp/ada1efi

Aside: A fun time-waster problem that I encountered was cp complaining about an "input/output error" when I copied files to the ESP. I made backups in-place by renaming the existing files. Then copied the new loader over. I didn't understand the error -as it didn't explain what went wrong-, and the new file showed up with ls, so I gave it a go. It of course did not boot. After more diagnosis I noticed that the file size was not what was expected. The diff program said binary files differ. The cmp command did not see differences, but reached EOF. I used od to dump the files and diff them - where I discovered that the copied file had been truncated. It was missing a big chunk of data at the end. More investigation, and finally it occurs to me to df -h. Yep, the EFI system partition on my server with many terabytes of storage is... 766K. It was literally full since I had left a backup of the old loader.

Lessons learned​

  • zpool upgrade can be dangerous - especially on boot/root filesystems. Don't do it just to silence the nag message in zpool status.
  • You need to manually upgrade the loader in the ESP when upgrading ZFS.
  • It's a good idea to do a test reboot after any such upgrade. Have a USB drive handy and be prepared for some pain if you missed a step.
I know that it is my own stupid fault, but I feel like this should be better called out. It's purely my mistake, but I had to make two mistakes for this to happen.
  1. Not realize that I needed to manually upgrade the ESP loader when doing a FreeBSD upgrade, and
  2. Not realize that zpool upgrade depends on the ESP loader version.
I did consult documentation before I did either of these tasks. If these requirements were called out somewhere then I missed them.

Update: This information was added to the handbook at some point.

Thanks everyone!​


My server is now back online - mostly. There are a couple other issues (likely caused by my testing and troubleshooting efforts), but it has passed the wont-boot problem.

Oh, and I can finally get my freebsd-version. It is 13.2-RELEASE.

Also, kudos to Eric A. Borisch - who somehow managed to divine the exact chain of events leading to this situation. Without me even mentioning rare reboots or a past zpool upgrade. I had not considered that the power interruption was unrelated to the problem.
 
Also, kudos to Eric A. Borisch - who somehow managed to divine the exact chain of events leading to this situation. Without me even mentioning rare reboots or a past zpool upgrade. I had not considered that the power interruption was unrelated to the problem.
Glad I could help. When I read “UPS fault” my first reaction was “I wonder how long the system has been up?” which led to the (frankly somewhat confusing and easy to miss) boot loader vs. zpool version/upgrade guess. That and the fact that the pools were healthy when imported with a USB boot.

Glad it’s working!
 
There was no boot1.efi in my ESP. They only contained two files:

> ls /tmp/ada0efi/EFI/BOOT
BOOTX64.EFI STARTUP.NSH

There was a boot1.efi file in your ESP under the name /EFI/BOOT/BOOTX64.EFI. The firmware looks for a file with the name /EFI/BOOT/BOOTX64.EFI so either /boot/loader.efi or /boot/boot1.efi are copied in the ESP under that name. The string "FreeBSD EFI boot block" can be found in /boot/boot1.efi (or in the source code at /usr/src/stand/efi/boot1/boot1.c).

In your original setup you had the firmware load boot1.efi from the ESP, boot1.efi loaded loader.efi from the bootpool, then loader.efi loaded the kernel from the zroot pool. In your modified setup which is more in line with modern practices, you have the firmware loading loader.efi from the ESP, skipping boot1.efi as an intermediate step.

Also, under the EFI/ directory I only had a BOOT/ folder. And I do not have an efi/freebsd/ folder that I see people mentioning in various articles and discussions. Not sure what's going on with that.
You can create an /EFI/FREEBSD folder in your ESP provided that there is enough space. You can then put LOADER.EFI there and use efibootmgr(8) to make the firmware recognize it as a boot method.

Bash:
# mount bootpool somewhere I can write to in usb live environment
mkdir /tmp/bootpool
zpool import bootpool    # may need -f on first run
zfs set mountpoint=/tmp/bootpool bootpool
The easier way is to use zpool import -R /tmp/bootpool bootpool. There is no need to set the "mountpoint" property or reset it at the end.

Bash:
# copy loader from memstick /boot to the bootpool and esp on both mirrors
cp /boot/loader.efi /tmp/bootpool/boot/loader.efi
If you don't use boot1.efi, you don't have to have the loader inside bootpool or to have a bootpool at all, unless you store a keyfile there.

Yep, the EFI system partition on my server with many terabytes of storage is... 766K. It was literally full since I had left a backup of the old loader.
The output of gpart show ada0 that you provided showed an EFI System Partition of 200MB which is a lot more than 766K. It also didn't show any terabytes of storage.
 
The system is configured with bootpool as a mirrored pair of SSDs - ada0 and ada1. The zroot is geli encrypted
"bootpool" , "zroot", this sounds like /boot is on "bootpool", and the rest of the system on "zroot". Please clarify.
Correct. the "bootpool" is just /boot. The "zroot" is the rest of the system (e.g. zroot/usr, zroot/var).

does that mean bootpool, where bootpool/boot/encryption.key is stored, is un-encrypted?
Unless there is a extra protection not mentioned here, there is a serious design flaw in the "un-encrypted /boot "bootpool" pool, storing the keyfile to decrypt "zroot" pool on encrypted Root-on-ZFS" setup.

If there is not a second geli user key (i.e. passphrase, keyfile) besides bootpool/boot/encryption.key from the unencrypted pool, what good is the encryption for the "zroot" pool.

Without a second user key (passphrase, keyfile), anybody powering the machine on can access the encrypted partition of the setup.

If an encrypted Root-on-ZFS provider decrypted by a keyfile is a "must", even if a second user key is used, the keyfile should at least be located on a removable drive, not on the same disk (to separate the keyfile and encrypted Root-on-ZFS provider from the same machine).


Second, in this particular setup, why bother with a unencrypted /boot pool at all? Booting from a full disk encrypted ROOT-on-ZFS (by passphrase, not keyfile) was introduced 2016-05-08, and is supported during menu guided bsdinstall [2], at least since the 11 branch ([3] 11.0-RELEASE, release date October 10, 2016).

Though, only passphrase is supported, not keyfile(s).


Or is this some sort of ssh remote geli decryption setup. But then, how does the keyfile on the unencryted boot partition fit in?


[1]
Create the GELIBOOT GEOM_ELI flag 2016-04-08

[2]
bsdinstall/zfsboot GPT+BIOS+GELI installs now make use of GELIBOOT 2016-05-22

that's the geli(8) -g option: /usr/libexec/bsdinstall/zfsboot
Rich (BB code):
201 GELI_PASSWORD_GELIBOOT_INIT='geli init -bg -e %s -J - -l 256 -s 4096 "%s"'

[3]
Unsupported FreeBSD Releases
 
There was a boot1.efi file in your ESP under the name /EFI/BOOT/BOOTX64.EFI.
Oh, now I understand. The "boot1.efi" you mentioned was a program name, not a file name. That explains why the original BOOTX64.EFI file was so much smaller (128KB) than the loader that I replaced it with (637KB). The original file was the boot1.efi(8) program, and the new loader is the loader.efi(8) program. The boot1.efi program was a smaller/simpler -now deprecated- loader meant just for the first step.

Thanks for clarifying.

You can create an /EFI/FREEBSD folder in your ESP provided that there is enough space. You can then put LOADER.EFI there and use efibootmgr(8) to make the firmware recognize it as a boot method.
Would there be any benefit to doing this? I don't understand why both the efi/boot/ and the efi/freebsd/ directories exist in newer installations. It looks like they both hold the same loader.efi program. Maybe to support multi-boot scenarios?

The easier way is to use zpool import -R /tmp/bootpool bootpool. There is no need to set the "mountpoint" property or reset it at the end.
That's an excellent improvement. Thanks for that. I've edit the script in my post with the change.

If you don't use boot1.efi, you don't have to have the loader inside bootpool or to have a bootpool at all, unless you store a keyfile there.
Good to know. Though I do store the keyfile there, so it is useful for my particular setup.

The output of gpart show ada0 that you provided showed an EFI System Partition of 200MB which is a lot more than 766K.
I was very confused by that as well. I probably wasted a solid 2 hours figuring out why the file was being truncated. I saw the 200M partition size, so didn't consider that to be a factor until late in the troubleshooting process.

gpart output:
Code:
> gpart show ada{0,1}
=>        40  1000215136  ada0  GPT  (477G)
          40      409600     1  efi  (200M)
      409640        2008        - free -  (1.0M)
      411648     4194304     2  freebsd-zfs  (2.0G)
     4605952     4194304     3  freebsd-swap  (2.0G)
     8800256   991414272     4  freebsd-zfs  (473G)
  1000214528         648        - free -  (324K)

=>        40  1000215136  ada1  GPT  (477G)
          40      409600     1  efi  (200M)
      409640        2008        - free -  (1.0M)
      411648     4194304     2  freebsd-zfs  (2.0G)
     4605952     4194304     3  freebsd-swap  (2.0G)
     8800256   991414272     4  freebsd-zfs  (473G)
  1000214528         648        - free -  (324K)

df output:
Code:
> df -hT /dev/ada{0,1}p1
Filesystem   Type       Size    Used   Avail Capacity  Mounted on
/dev/ada0p1  msdosfs    767K    648K    119K    84%    /mnt/ada0p1
/dev/ada1p1  msdosfs    767K    648K    119K    84%    /mnt/ada1p1

Here's a thread by someone else who was in the same situation. Thread efi-partition-too-small.95893

It looks like the installer used to dd an 800KB FAT image to the ESP - specifically boot1.efifat. So while the partition is a reasonable size, the filesystem is not. This seems to have been reworked in December 2018. https://reviews.freebsd.org/D17947

I could resize it -or rather, replace it- with a filesystem that uses the whole 200MB partition if I need to. But that seems like a high-risk operation without any particular benefit, so my plan is just to leave it as-as. Though I suspect that one day a future version loader.efi(8) program could grow beyond the 766KB of writable space available on the old boot1.efifat filesystem and force me to replace the ESP filesystem.

It also didn't show any terabytes of storage.
That's actually on a different pool. I was perhaps being a bit facetious, but it does exist.

> zpool get size storage
NAME PROPERTY VALUE SOURCE
storage size 30.9T -


I omitted the storage pool details when drafting this thread because it was unrelated to the issue. The text was getting a bit long so I tried to limit superfluous information.

If there is not a second geli user key (i.e. passphrase, keyfile) besides bootpool/boot/encryption.key from the unencrypted pool, what good is the encryption for the "zroot" pool.

Without a second user key (passphrase, keyfile), anybody powering the machine on can access the encrypted partition of the setup.
The encryption key uses a passphrase. I have to physically type that into the machine on boot up.

Second, in this particular setup, why bother with a unencrypted /boot pool at all? Booting from a full disk encrypted ROOT-on-ZFS (by passphrase, not keyfile) was introduced 2016-05-08, and is supported during menu guided bsdinstall [2], at least since the 11 branch ([3] 11.0-RELEASE, release date October 10, 2016).
I don't recall the FreeBSD version I installed when I first set up the system. It was also long enough ago that I don't recall if it had the encryption options that your referencing. I also don't recall why I made the specific choices that I did when laying out the encryption setup. Though, in my defense, I do remember spending over a month studying the handbook as well as many guides and discussions about how to set up encryption. So I'd like to believe that I followed the best practices at the time.

I do recall that actual ZFS encryption (as in the ZFS feature, non-geli) did not exist at the time - or it was linux-only, or unstable, or something. From what I remember, the options for encrypted boot at the time had some trade-offs that didn't make it desirable - at least for me. But I can't recall what the factors were. It protected against threats and actors that didn't seem relevant and had some kind of downside for my use case.

My primary threat mitigation concern is someone physically stealing the machine and accessing the data, or extracting data from a decommissioned hard drive - both which seems to be handled by a passphrase-protected key on an unencrypted boot unlocking the encrypted system and data storage pools. If the CIA is going to send an operative to sneak in and inject a custom bootloader on the /boot drive to steal the passphrase when I type it in then they could inject a custom firmware image on the motherboard to do the same. If a guy with a wrench is going to beat the passphrase out of me, then my plan is to cough it up before the first whack. Then there's IPMI, which I suspect is either breakable or fully backdoored for a state-level actor - so gaining access to /boot probably isn't even all that important if someone is concerned about such an adversary.

That said, if it is possible to encrypt /boot, and it doesn't make the system any more difficult to use or increase the risk of failure/data-loss then I would of course prefer to do so.

Is a passphrase-protected key on an unencrypted boot considered insecure now? Are there upgrade paths to add boot encryption to an existing system? There's a lot to re-learn if the encryption choices that I made were incorrect or outdated. The man page describing the geli -g option tells me what it does, but not how to use it in an upgrade scenario. I'll do some reading up on it. Would you be able to point me to any guides that might be helpful?

is this some sort of ssh remote geli decryption setup. But then, how does the keyfile on the unencryted boot partition fit in?
No. Not remote. It's a traditional (antiquated?) geli setup.
 
Would there be any benefit to doing this? I don't understand why both the efi/boot/ and the efi/freebsd/ directories exist in newer installations. It looks like they both hold the same loader.efi program. Maybe to support multi-boot scenarios?

The benefit would be in learning more about how the boot process works. The file /EFI/BOOT/BOOTX64.EFI exists for compatibility with the firmware on some machines and its contents is typically the same as /EFI/FREEBSD/LOADER.EFI. Different operating systems storing their boot files in different subdirectories of /EFI is for multi-boot scenarios, indeed. The directories prevent naming clashes.

It looks like the installer used to dd an 800KB FAT image to the ESP - specifically boot1.efifat. So while the partition is a reasonable size, the filesystem is not. This seems to have been reworked in December 2018. https://reviews.freebsd.org/D17947

I could resize it -or rather, replace it- with a filesystem that uses the whole 200MB partition if I need to. But that seems like a high-risk operation without any particular benefit, so my plan is just to leave it as-as. Though I suspect that one day a future version loader.efi(8) program could grow beyond the 766KB of writable space available on the old boot1.efifat filesystem and force me to replace the ESP filesystem.

Replacing the tiny filesystems in the ESP partitions so that the whole partitions are occupied is a low-risk operation unless you mistype the partition numbers when entering commands. You can mitigate that risk by using partition labels. First you label the partitions (with gpart(8)), then you verify the labels, then you execute the destructive commands.

To replace the filesystems, all that needs to be done is to run the newfs_msdos(8) command, mount the new filesystem, copy the necessary files and unmount it. Run efibootmgr(8) as needed. Remember to unmount the ESP filesystem before you start if it is already mounted (on /boot/efi, for example). You can make full backups of the ESP partitions with dd(1) and because you have two of them, you can upgrade them one by one, keeping the other one as a backup.
 
Though, in my defense, I do remember spending over a month studying the handbook as well as many guides and discussions about how to set up encryption. So I'd like to believe that I followed the best practices at the time.
Hard to comment without a time frame when the installation happened to determine a version (try zfs list -o name,creation). If it's a 13 branch then there wouldn't have been the need of a separate /boot partition (assuming no keyfile is involved). A full disk encrypted Root-on-ZFS would have been possible already then.


My primary threat mitigation concern is someone physically stealing the machine and accessing the data, or extracting data from a decommissioned hard drive - both which seems to be handled by a passphrase-protected key on an unencrypted boot unlocking the encrypted system and data storage pools.
Is a passphrase-protected key on an unencrypted boot considered insecure now?
It isn't recommended to begin with.

If you are concerned of data access from a encrypted stolen / decommissioned hardware, then I wouldn't ease the attack by delivering one of the user keys.

It's like having a metal safe in your home, which requires a physical key (to put in a keyhole) and a keypad to enter a passphrase as the second required user key to unlock, and you are putting the physical key in plain sight near the safe.

Every reputable safe producer would recommend against it (to say it mildly), and any insurance company would deny any compensation if the safe was compromised.


If the CIA
If the CIA is interested in your data, then I wouldn't worry about any data security. They will get it (and you), one way, or the other.


That said, if it is possible to encrypt /boot, and it doesn't make the system any more difficult to use or increase the risk of failure/data-loss then I would of course prefer to do so.
There wouldn't be any difficulty in using a encrypted /boot partition (not taking the maintenance in count). Data loss can also be ruled out, because how could a normal /boot on a different partition destroy data on another file system (apropos, I hope there are somewhere verified backups of the important data and encryption.key).

You can recreate "bootpool" on a geli encrypted partition, i.e.: backup /boot/encryption.key , /boot/loader.conf , /dev/random out the partition, geli init -g -l 256 -s 4096 <partition>, enter passphrase, create file system on .eli device (ZFS, UFS), mount (UFS) / import (ZFS) file system, cp -a /boot directory from installer media, make sure the kernel matches "zroot" userland, copy encryption.key, loader.conf. Eventually you need to set 'currdev' or 'rootdev' (loader_simp(8)).

Are there upgrade paths to add boot encryption to an existing system?
If you mean the "zroot" pool then it's possible to add a geli boot configuration to that system. You can turn the "zroot" Root-on-ZFS in a fully bootable system, skipping the "bootpool" partition entirely, however, without a keyfile.

Make sure the encrypted "zroot" Root-on-ZFS has a complete /boot directory, the -g flag is set to the geli provider, remove the keyfile user key from the master key copies ( geli setkey -n 0 <provider> passphrase (original or new) and ... -n 1 ... passphrase), edit loader.conf accordingly.


The man page describing the geli -g option tells me what it does, but not how to use it in an upgrade scenario.
See the "configure" argument:
Code:
     configure  Change configuration of the given providers.

                Additional options include:

               -g
                   Enable booting from this encrypted root filesystem.  The
                   boot loader prompts for the passphrase and loads loader(8)
                   from the encrypted partition.
The configuration "Flags" of the provider can be shown with the geli list command. The -g configuration is listed as "GELIBOOT" flag, i.e.:
Code:
% geli list | grep -e name -e Flags
Geom name: nda0p3.eli
Flags: BOOT, GELIBOOT, AUTORESIZE
Geom name: gpt/swap0.eli
Flags: ONETIME, W-DETACH, W-OPEN, AUTORESIZE
BOOT, GELIBOOT, ONETIME are user/script configured, AUTORESIZE, W-DETACH, W-OPEN, are automatically set.

If you want to make geli related changes to your existing system, I strongly recommend to practice in a virtual machine first (bhyve, VirtualBox, qemu) , simulating the real machine setup, same FreeBSD version as the original, before operating on the actual machine. It would minimize failure, and show you if it suits your expectation.

EDIT: Alternatively, just set up a new encrypted Root-on-ZFS installation, receive backed-up data on new system.
 
Back
Top