Solved broken disk aftermath : /boot/efi not mountable , only boots with original disk

Hi all,
this was/is my zroot's zpool configuration :
Bash:
root@smadevnu:/usr/home/achill # zpool status zroot
  pool: zroot
 state: ONLINE
  scan: scrub in progress since Mon Jul 31 12:44:17 2023
        54.1G scanned at 34B/s, 26.9G issued at 17B/s, 54.3G total
        0B repaired, 49.46% done, no estimated completion time
config:

        NAME        STATE     READ WRITE CKSUM
        zroot       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p4  ONLINE       0     0     0
            ada1p4  ONLINE       0     0     0

errors: No known data errors
root@smadevnu:/usr/home/achill #

Here is the history. At some point, simple tasks e.g. vi a file were getting terribly saw, I run dmesg and saw a bunch of hardware disk errors, and before I could issue any zpool command the system got unresponsive. I booted but could not get into the BIOS, there was something that drove the motherboard crazy. So, since the machine was under warranty we sent it for repair, after 14 days, they returned it with just a .... bios upgrade. We started (I and one admin) it but one disk was giving the same problems (BIOS stuck). We removed this disk and I got a booting system which would get stuck when mounting from fstab /boot/efi :
Code:
mount_msdosfs: /dev/ada0p1: Invalid argument

I managed to get a shell (IIRC by booting single user) and commented out the /boot/efi entry, so I I got a running system. Afterwards we ordered a new disk, I issued :
Code:
gpart backup ada0 | gpart restore -F ada1
and
Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
ada1 being the new disk, we attached the new disk, we marked the damaged disk offline and then detached the damaged disk. Resilvering went good, and is currently running scrub.

We tried to test the system by booting from the new disk (by disconnecting the old working disk ), but the boot process does not seem to advance, I get this blinking prompt on an empty screen and nothing more. I also run
Code:
gpart set -a active /dev/ada1
and (after many failed attempts)
Code:
dd if=/dev/ada0p1 of=/dev/ada1p1

(this didn't help either)

However I am confused, /dev/ada0p1 /dev/ada1p1 should be the efi partition, not the freebsd-boot.

My gpart status :
Code:
root@smadevnu:/usr/home/achill # gpart show -l
=>        40  3907029088  ada0  GPT  (1.8T)
          40      532480     1  efiboot1  (260M)
      532520        1024     2  gptboot1  (512K)
      533544         984        - free -  (492K)
      534528     4194304     3  swap1  (2.0G)
     4728832  3902300160     4  zfs1  (1.8T)
  3907028992         136        - free -  (68K)

=>        40  3907029088  ada1  GPT  (1.8T)
          40      532480     1  (null)  (260M)
      532520        1024     2  (null)  (512K)
      533544         984        - free -  (492K)
      534528     4194304     3  (null)  (2.0G)
     4728832  3902300160     4  (null)  (1.8T)
  3907028992         136        - free -  (68K)

Maybe the problem is the lack of labels on the new disk?

So my issues are :
  1. why does /boot/efi refuses to mount (msdosfs) ?
  2. why is /boot/efi even needed?
  3. How can I make the system boot from the second disk ?
  4. What did I get wrong in the process
Thank you!
 
By this, you installed a legacy BIOS bootcode on the efi partion of ada1 (ada1p1):
Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
So, you broke it. you should have typed '-i 2' instead of '-i 1'.

The installation and update of the bootcodes aren't well explained in the handbook and in the man pages.

The (null) you see are actually just a lack of label This doesn't affect the functioning of ada1.
You have to place the different bootcodes at the right places and I recommend you to recreate the efi partition on ada0 / ada1 and format them. They should be mountable after in order to copy the efi loader.

I think you boot in legacy BIOS, so use gpart bootcode to place the BIOS bootcodes in ada0p2 (to update it) and ada1p2.

See here: https://forums.freebsd.org/threads/update-of-the-bootcodes-for-a-gpt-scheme.80163/
 
I will try to get legacy working. By "I recommend you to recreate the efi partition on ada0 / ada1 and format them", you mean smth like
1) boot single user
2) newfs -t msdosfs right?
3) mount /boot/efi
4) cp /boot/loader.efi /boot/efi/efi/boot/bootx64.efi

right?

regarding Legacy
Code:
  gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada1
so I care about ZFS or not?
 
By "I recommend you to recreate the efi partition on ada0 / ada1 and format them", you mean smth like
1) boot single user
2) newfs -t msdosfs right?
3) mount /boot/efi
4) cp /boot/loader.efi /boot/efi/efi/boot/bootx64.efi

right?
Don't need to boot to single user mode, just get it booting so you can fix this partition, you don't necessarily have to be in single user mode.

so I care about ZFS or not?
Yes. Use gptzfsboot(8), not gptboot(8). Just make sure you're writing this boot code to the freebsd-boot partition, which is -i 2 in your case.
 
Good day, I just did as Emrion and SirDice suggested:
Code:
newfs_msdos /dev/ada0p1
newfs_msdos /dev/ada1p1
mount and then do for each partition :
Code:
mkdir -p /boot/efi/efi/boot
cp /boot/loader.efi /boot/efi/efi/boot/bootx64.efi
and then deal with the LEGACY boot partition (for the new disk, for starters) :
Code:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 2 ada1

So I managed to boot from the 1st drive legacy, then the 2nd drive legacy, then the 1st drive UEFI, however in the BIOS I dont seem to get the option to boot from the second drive UEFI mode. The two EFI partitions seem identical :
Code:
root@smadevnu:~ # mount_msdosfs /dev/ada0p1 /boot/efi
root@smadevnu:~ # ls -l /boot/efi/efi/boot/
total 1760
-rwxr-xr-x  1 root  wheel  892928 Aug  1 09:12 bootx64.efi
root@smadevnu:~ # umount /boot/efi
root@smadevnu:~ # mount_msdosfs /dev/ada1p1 /boot/efi
root@smadevnu:~ # ls -l /boot/efi/efi/boot/
total 1760
-rwxr-xr-x  1 root  wheel  892928 Aug  1 09:13 bootx64.efi
root@smadevnu:~ # umount /boot/efi
root@smadevnu:~ #
 
What is the output of gpart show ada1?
Without the '-l' option, we will see the type of the partitions, including the one of ada1p1.
 
What is the output of gpart show ada1?
Without the '-l' option, we will see the type of the partitions, including the one of ada1p1.
Code:
root@smadevnu:~ # gpart show ada1
=>        40  3907029088  ada1  GPT  (1.8T)
          40      532480     1  efi  (260M)
      532520        1024     2  freebsd-boot  (512K)
      533544         984        - free -  (492K)
      534528     4194304     3  freebsd-swap  (2.0G)
     4728832  3902300160     4  freebsd-zfs  (1.8T)
  3907028992         136        - free -  (68K)

What I get in BIOS menu is this , and with only the 2nd disk this :
 

Attachments

  • 20230801_100048.jpg
    20230801_100048.jpg
    418.1 KB · Views: 44
  • 20230801_100317.jpg
    20230801_100317.jpg
    364.8 KB · Views: 44
Reading this, I see no reason why you can't boot with EFI on that disk.
If you want to be sure, unplug ada0 and retry. That's the way RAID1 should save your system if ada0 fails.

Perhaps your efi firmware needs an entry to list ada1. You can play with efibootmgr (root user) to display and add an entry in this table. Avoid to modify the boot order of these items.
 
Reading this, I see no reason why you can't boot with EFI on that disk.
If you want to be sure, unplug ada0 and retry. That's the way RAID1 should save your system if ada0 fails.

Yes the 2nd photo is with ada0 unplugged, but I didn't try to move the new disk to the slot 0 (so that it is reported in BIOS as P0). However as I wrote, I can definitely boot with the second disk. I have tried this (legacy boot only, non EFI) both ways booting from the 2nd (new) disk, and booting from the 2nd disk with the old unplugged. Worked every time. ZFS self corrected, pretty resilient !
Perhaps your efi firmware needs an entry to list ada1. You can play with efibootmgr (root user) to display and add an entry in this table. Avoid to modify the boot order of these items.

efibootmgr gives
Code:
root@smadevnu:/usr/home/achill # efibootmgr -v
Boot to FW : false
BootCurrent: 000a
Timeout    : 1 seconds
BootOrder  : 000A, 0009, 000B, 0004, 0006
+Boot000A* UEFI OS HD(1,GPT,20350f8c-ff12-11ec-9764-18c04dce4c19,0x28,0x82000)/File(\EFI\BOOT\BOOTX64.EFI)
                      ada0p1:/EFI/BOOT/BOOTX64.EFI (null)
 Boot0009* WDC WD20EZAZ-00L9GB0 BBS(HD,,0x0)
                               PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x1)/Sata(0x1,0xffff,0x0)
                               VenHw(2d6447ef-3bc9-41a0-ac19-4d51d01b4ce6,2000200020002000570020002d0044005800570032003500430041004d00320048003000410037000000)
 Boot000B* WDC WD20EZAZ-00L9GB0 BBS(HD,,0x0)
                               PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x1)/Sata(0x0,0xffff,0x0)
                               VenHw(2d6447ef-3bc9-41a0-ac19-4d51d01b4ce6,2000200020002000570020002d0044005800570032003400380044003000310037004600350037000000)
 Boot0004  Samsung SSD 980 1TB BBS(HD,,0x0)
                              PciRoot(0x0)/Pci(0x1,0x1)/Pci(0x0,0x0)/NVMe(0x1,71-61-b3-11-db-38-25-00)
                              VenHw(2d6447ef-3bc9-41a0-ac19-4d51d01b4ce6,53003600340039004e0058003000520042003300300036003700350058000000)
 Boot0006  HGST HUS726T6TALE6L4 BBS(HD,,0x0)
                               PciRoot(0x0)/Pci(0x1,0x3)/Pci(0x0,0x1)/Sata(0x5,0xffff,0x0)
                               VenHw(2d6447ef-3bc9-41a0-ac19-4d51d01b4ce6,3700560041004700580044004a0041002000200020002000200020002000200020002000200020000000)


Unreferenced Variables:
root@smadevnu:/usr/home/achill #

So as you say, I need to manipulate that. However, just by moving the new disk to SATA slot 0 would make it seem as P0 thus solving the EFI mount problem as is. Bottom line is : you guys helped a great deal! Thank you!
 
Glad you sorted out the main problems.

You can complete the thing with the indications you find in efibootmgr(8). This time, it's well explained. ;)

FreeBSD has a great documentation in general.
 
Back
Top