Solved Cannot boot if new disks are present

jbo@

Developer
I have freshly installed a system with the following parameters:
  • FreeBSD 13 RC5
  • 2x 256 GB SATA SSD (ZFS mirror, zroot)
  • 8x 512 GB SATA SSD (ZFS raidz-3, storage)
The two 256 GB drives are used for the base system, the eight 512 GB drives are used for storage.

The problem I am encountering: The system boots fine as long as the 8 drives are not present. Once the system is booted, I can insert the disks and import the storage pool using zpool import and everything works well.
However, when the system boots with the eight drives installed, it fails to complete the boot procedure:
20210411_181008_resized.jpg
Based on the information present I'd say that this problem is related to the naming of the disks. When I boot without the 8 drives installed, the two system SSDs show up as ada0 and [/cmd]ada1[/cmd]. When I boot with the eight storage drives installed too, they show up as ada8 and ada9.

What is actual reason for this and what is the proper way of fixing/handling this?
Do I need to use GPT labels to make this work? If so, how can I do this without re-installing the machine (or re-creating the storage pool)?
 
As long as you need those 512GB disks only as a storage disk not as boot you can remove all EFI partitions from them. You need to have EFI partition only on the 2x256GB boot disks that are with mirror.

You can use "scroll lock" then "page up/page down" to scroll up the screen and identify which one is ada0 by it's serial number and after the restart under live cd to remove that corrupted EFI partition from that disk.

GPT labels will not help as your error is when the file system check can't verify the partition that is on /dev/ada0p1 (FAT32 EFI partition)
 
What leads you to believe that there are EFI partitions present on those?
I ran gpart destroy on every of the eight storage disks. This should have removed any traces of EFI partitions, right?
 
What leads you to believe that there are EFI partitions present on those?
You almost found it yourself: With the 8 SSDs attached, you actual system disks become ada8 and ada9. However, the error message complains about an EFI partition on ada0, so that's one of your 8 SSDs, not one of the system SSDs.

gpart destroy probably erases the partition table, but it might not destroy the actual partitions, so there's a chance that there's an EFI partition left on one of those drives, which your system then tries to mount.
 
If the SSDs are empty, go with a partition manager or dd to zeros.
Then use something like
Code:
zpool create tank2 da2 (or whatever)
zfs set compression=lz4 tank2 (or whatever)
no gpart at all
 
What leads you to believe that there are EFI partitions present on those?
According your screenshot, read the message at the bottom after "Starting file system checks:"
msdosfs: /dev/ada0p1 (/boot/efi)

This should have removed any traces of EFI partitions, right?
Boot from live cd then past the output of gpart show then you can check if you see any EFI partition on them or not on those 512GB SSD disks.

Note: If you don't have EFI partition on them then your have issue with one of the EFI partition on some of your boot disks that need to be fixed. I already told you how to identify the S/N of the ada0p1 from your first post so you can know which exact disk have problem with it's partition that need to be fixed.
 
After inserting the 8 disks and all the device names have rotated /boot/efi can't be mounted anymore
Code:
Can't open '/dev/ada0p1'
...
       msdosfs: /dev/ada0p1 (/boot/efi)
being set in /etc/fstab under that device name. Run from single user mode gpart show -pl. It will show the partitions and their labels.

Former ada0p1 should be labeled as efiboot0. Mount / read/write, edit /etc/fstab, set as device /dev/gpt/efiboot0, run afterwards fsck_msdosfs(8) on it. That should fix it.
 
It indeed boiled down to the fact that my /etc/fstab listed an entry that mounts the EFI ESP which was being referred to by /dev/ada0p1 which of course was a problem once the other 8 drives were inserted and ada0 became a different disk that didn't contain an ESP.

Removing the ESP mount entry from /etc/fstab resolved this problem. I am actually not sure why it was in there in the first place.

As for the rest, I've decided to use GPT labels to create the ZFS pool: https://silizium.io/post/freebsd_zfs_gpt_labels/
 
I did a fresh install of 13.0-RELEASE on a SSD in a multi disk PC, and dealing with rotated device names and mounted /boot/efi in /etc/fstab I remembered this thread.

Removing the ESP mount entry from /etc/fstab resolved this problem. I am actually not sure why it was in there in the first place.
Explanation, and if that entry is still removed you might want to consider to undo the delete:

From link in Thread 82466:
Code:
Mount the EFI system partition (ESP) on newly-installed systems.

Per hier(7), the ESP will be mounted at /boot/efi. On UFS systems,
any existing ESP will be reused and mounted there; otherwise, a new one
will be made. On ZFS systems, space for an ESP is allocated on all disks
in the root pool, but only the partition actually used to boot is set up
and mounted.

This makes future upgrades of the EFI loader easier (upgrade scripts can
just change /boot/efi) and also greatly simplifies the parts of the
installer involved in initialization of the ESP. It also makes the
installer's behavior correspond to the documentation in hier(7).
 
Back
Top