[fixed/workaround] Debug help? zpools disappearing (no dmesg) while using bsdinstall to create a boot disk (stable/13)

I'm trying to create a boot disk for a Minisforum HX90 machine. It seems that this machine's BIOS may need a boot disk that would be secureboot-capable. There's a working FreeBSD 13.1 installation on the machine's nvme disk, uname:

Code:
FreeBSD xmin.cloud.thinkum.space 13.1-STABLE FreeBSD 13.1-STABLE #0 build/stable/13-n252824-84b4709f38f: Fri Oct 28 16:45:22 PDT 2022     gimbal@xmin.cloud.thinkum.space:/usr/obj/xmin_FreeBSD-13.1-STABLE_amd64/usr/src/amd64.amd64/sys/XMIN amd64

I've created what I believe are the files bsdinstall(8)() would need if downloading from a distribution site, but using my local build, calling make like follows:

Code:
# cd /usr/src; make -C release packagesystem KERNCONF=mykernconf

I'm trying to use these locally built distribution files to create a boot disk. I've tried a number of variations as to where that boot disk would be located - a list, below. In all instances, I'm seeing all non-root zpools disappearing when it reaches the ZFS install part of bsdinstall. The zpool with the root filesystem remains available, but all mounted data sets on that zpool disappear along with the other two zpools on the machine (two separate SSDs, in addition to the NVME stick)

I'm not seeing any output from dmesg when the zpools disappear. I wonder if there's any help available for debugging what's happening here?

I'd created this machine's FreeBSD installation (root on ZFS) using an installer image on a USB thumb drive, before some subsequent updates on the rooot fs.

Trying to figure out what might be going wrong between bsdinstall and the machine itself, I've since re-labeled all of the GPT partitions on the system's NVME disk, each to something that would hopefully not colide with the GPT labels used by bsdinstall. I'm still seeing the zpools on the machine disappear and the non-root datasets disappearing too, when bsdinistall reaches the ZFS installation stage.

Could there be anything happeniing with regards to /etc/zfs/zpool.cache on the installation host, when bsdinstall runs? Maybe it was created to install from a root-on-UFS system?

I've seen this happen with a destination geom in all of the following configurations:
  • USB external drive, with a disk using an M2 SSD in an enclosure by Sabrent, accessed over USB 3.0
  • ZVOL on each of the two regular SSD disks on the machine
  • ZVOL on the NVME disk on the machine, using the same zpool as where the machine's root filesystem is located
  • md image created from a file on a ZFS filesystem
In all instances, the non-root zpools and non-root filesystems all disappear, once bsdinstall reaches the ZFS installation stage.

The error messages that I'm seeing in bsdinstall may vary between two mainly. Before I'd relabeled the GPT partitions on the NVME root disk, I was seeing an error when it tries to access efiboot0. That was probably due to a GPT label collision. I don't see any options in the bsdinstall manual page for changing the label it uses there. So, I re-labeled the efiboot0 partition on my main root disk, then re-labeled every other partition there and adjusted the swap mount point in /etc/fstab.

Since then, the message I'm seeing in bsdinstall at the point when it fails:
Code:
eval: cannot create /tmp/bsdinstall_etc/fstab: No such file or directory

This is presumably because the zpools and non-root datasets have disappeared at that point.

When I check dmesg after that, it does not show any error messages, no indication of what's happened to the system that the active non-root zpools and non-root datasets have disappeared. At that point, I have to reboot the machine, it can't even access /var/log any more.

The most that I'm seeing in /var/log/messages after that point:

Code:
ZFS WARNING: Unable to attach to ada1.

There is literally nothing else in the log output indicating the failure.

Curiously, it doesn't say the same for ada0 where there is a second zpool that disappears. Maybe ada1 is a bad disk somehow? It's a Crucial SSD, not very old. I don't believe I've seen any normal I/O errors with the disk so far.

After reboot, 'zpool status' does not show any filesystem corruption. This has happened with every possible combination of destination-image that I can imagine trying on this machine, so great thing if there's no filesystem damage from it.

This machine is using an AMD processor, Ryzen 9 5900HX. I don't know how well-supported the machine's full hardware stack might be. It's a Minisforum HX90 box.

I'm going to try a similar approach on an old laptop, it has a UBS 3 port at least.

I'd like to be able to debug the installation on the machine that I'm using. Is there any advice about how I could debug what's making the zpools disappear here?

Before moving the release/distfile data to the old laptop, I think I'll try creating a root-on-UFS VM. Maybe it's related to the situation of that the installation host is using ZFS too?

Thx. Health, all
 
bsdinstall makes "assumptions" about your config.
If these assumptions are wrong you better use command-line and enter commands manually.
Commands like "zfs create" & "tar xpf base.txz kernel.txz lib32.txz".
 
I was able to create a root-on-UFS configuration on a sparse disk image on the internal disk, accessed after mdconfig as md0, then transferred this image to a USB disk with ddpt using oflag=trim. After `gpart recover` for the resulting disk image, then labeling the gpt partitions for the disk and editing the disk's /etc/fstab to match, was able to boot from the resulting disk.

Whatever exactly bsdinstall would do that adds the secureboot support, at least that's completed now. I can try creating a ZFS pool on the root gpart partition from there, then updating for using gptzfsboot etc. The bsdinstall() manual page lists the default ZFS dataset layout, so there's that too
 
Back
Top