ZFS How To Add Bootcode To All Disks

I've have a ZFS root system on a 4 drive raidz1. I want to ensure I have bootcode on all drives. I must only have it on one currently because I have a failing drive and when I attempt to remove it, my system will no longer boot. Here is my "gpart show" info:

Code:
 # gpart show
=>       34  976773101  diskid/DISK-WD-WCAYU7656378  GPT  (466G)
         34        128                            1  freebsd-boot  (64K)
        162  976772973                            2  freebsd-zfs  (466G)

=>       34  976773101  diskid/DISK-9QG6AAHZ  GPT  (466G)
         34        128                     1  freebsd-boot  (64K)
        162  976772973                     2  freebsd-zfs  (466G)

=>       40  976773088  diskid/DISK-6QG2X7H9  GPT  (466G)
         40        128                     1  freebsd-boot  (64K)
        168  976772960                     2  freebsd-zfs  (466G)

=>       34  976773101  diskid/DISK-9QMBTYTM  GPT  (466G)
         34        128                     1  freebsd-boot  (64K)
        162  976772973                     2  freebsd-zfs  (466G)

I've searched the web and it appears that this is the code I need to run on each disk but I get an error:

Code:
 # gpart bootcode -b  /boot/pmbr -p /boot/gptzfsboot -i 1 diskid/DISK-WD-WCAYU7656378
gpart: /dev/diskid/DISK-WD-WCAYU7656378p1: not enough space

How can I check to see which drives have bootcode and fix the ones that do not? I'm not sure what the next steps should be.

Thanks,

Drew
 
you have to:
backup the raid on something, repartition disks, restore data
or
remove the bad drive
partition the replacement with a 512k freebsd boot
resilver
remove second drive
partition it correctly
resilver
....
 
If the current situation doesn't permit recreating the system with a bigger freebsd-boot partition, as temporary solution, you can fix it as follows:

Use strings(1) to check which of the freebsd-boot partitions has the bootcode. Then copy the bootcode with dd(1) to all empty ones.

To be on the save side dd(1) the bootcode into a backup file first on a external media (i.e. USB stick). Also check the file with strings(1) to make sure.

If the system has data of value, there should be a working backup, in case something goes wrong.
 
you have to:
backup the raid on something, repartition disks, restore data
or
remove the bad drive
partition the replacement with a 512k freebsd boot
resilver
remove second drive
partition it correctly
resilver
....
When I repartition one disk with a larger boot (I was thinking 512K for future room), that will make my ZFS partition just a bit smaller than the other drives. Will there be issues joining that drive back to the pool because all ZFS partitions are not the same size?
 
Typically ZFS will fall back to smallest partition/disk. But If you get a bigger drive to replace, then you can actually expand the pool.
 
If the current situation doesn't permit recreating the system with a bigger freebsd-boot partition, as temporary solution, you can fix it as follows:

Use strings(1) to check which of the freebsd-boot partitions has the bootcode. Then copy the bootcode with dd(1) to all empty ones.

To be on the save side dd(1) the bootcode into a backup file first on a external media (i.e. USB stick). Also check the file with strings(1) to make sure.

If the system has data of value, there should be a working backup, in case something goes wrong.
This seems to be the route I'd prefer since I plan to upgrade all drives in this system. I'm on my last set of 500 GB drives and one can barely even buy those any longer. I do have four 1 TB drives and plan to rebuild the system in a month or two when i have more free time. This temporary solution seems quick and easy and will buy me the time I need to upgrade properly.

I've read the man page for strings and searched for a tutorial on how to use strings to check for bootcode. I didn't find anything specific but it seems I need to send strings a file name. So does that mean I need to figure out how to mount the boot partition and check a specific file name? I remain confused as to how to use strings to check for boot code.
 
Typically ZFS will fall back to smallest partition/disk. But If you get a bigger drive to replace, then you can actually expand the pool.
I currently have four 500 GB drives. So could I just add a 1 TB drives now, go through the resilver exercise after each, and then have a pool that's double the size when complete?
 
  • Like
Reactions: mer
...
I've read the man page for strings and searched for a tutorial on how to use strings to check for bootcode. I didn't find anything specific but it seems I need to send strings a file name. So does that mean I need to figure out how to mount the boot partition and check a specific file name? I remain confused as to how to use strings to check for boot code.

Apparently I cannot mount a boot partition because it doesn't have a file system as per SirDice's post in this thread:

The freebsd-boot partition doesn't have a filesystem, so there's nothing to mount there. It contains the contents of gptzfsboot(8). See gpart(8) on how to write it. The efi partition is a FAT32 filesystem, you'll need to use mount_msdosfs(8). Its contents are written by simply dd(1)'ing the /boot/boot1.efifat image, see efi(8) for more information.
 
So could I just add a 1 TB drives now, go through the resilver exercise after each, and then have a pool that's double the size when complete?
To the best of my knowledge, yes. So take the 1TB drive, gpart at least a 512K freebsd-boot partition, then use the rest (or set a size) for freebsd-zfs. Then attach/replace the new partition into the vdev, resilver, then replace all the other drives with the same procedure.

I know that process works for mirrors, I think it should work for raidz types.

Yes, SirDice is correct regarding the freebsd-boot partition.
 
the newest boot code i found that can fit in 64k is in 10.4
did not check 11.0 to 11.2
Code:
[user@host ~]$ strings -n20  /boot/gptzfsboot |head && ls -l /boot/gptzfsboot
Invalid VM86 Request
%s: No ZFS pools located, can't boot
failed to clear pad2 area of primary vdev
failed to parse pad2 area of primary vdev
failed to read pad2 area of primary vdev
%s: failed to mount default pool %s
Default: %s/<0x%llx>:%s
Can't find ZFS pool %s
Can't find dataset %s in ZFS pool %s
Can't mount ZFS dataset


-r--r--r--  1 root  wheel  43251 Jan  3  2019 /boot/gptzfsboot
you can try that from 10.4. also do not upgrade pull features until you change the size for boot partition
 
covacat Is that going to caus issues if the zpool is 13.0? If the zpool was created with <= 10.4 should not, but it's a cautionary point.
i does not matter that much in the end
he can boot from a stick until he solves it properly

i assume it was even pre 10.4
11.1 already does not fit
10.4 and 11.0 seem to have approx the same release date
LE
11.0 is 88k so no go
 
i does not matter that much in the end
he can boot from a stick until he solves it properly

i assume it was even pre 10.4
11.1 already does not fit
10.4 and 11.0 seem to have approx the same release date
LE
11.0 is 88k so no go
It's pretty old. I'm not sure how old but I tend to not touch things unless they stop working.

Code:
 # zpool get version zroot
NAME   PROPERTY  VALUE    SOURCE
zroot  version   28       local

I don't understand what this tells me:

Code:
 # strings -n20 /boot/gptzfsboot | head && ls -l /boot/gptzfsboot
Invalid VM86 Request
failed to clear pad2 area of primary vdev
ZFS: unsupported compression algorithm %s
ZFS: can't find dataset %ju
failed to detect primary vdev
zfree(%p,%ju): wild pointer
failed to parse pad2 area of primary vdev
ZFS: out of temporary buffer space
ZFS: unsupported ZFS version %u (should be %u)
ZFS: can't find root filesystem
-r--r--r--  1 root  wheel  105054 Oct 19  2020 /boot/gptzfsboot

I'm thinking maybe using dd to copy the boot code from ada4p1 (the disk that seems to have it) to another drive and trying to boot would be a good path forward? I've been searching for a definitive example of how to do that but haven't been able to turn up anything.
 
ver 28 is probably 9.2 or 9.1
yes you can dd it from another partition and see if it boots
Thank you. Can you point me to an example of how to use dd to copy my 64K freebsd-boot partition from ada4 to ada3? I'm unable to turn up anything.
 
dd if=/dev/ada4p1 of=/dev/ada3p1 bs=64k
if=source
of=destination
bs=block size (default is 512b which is slow)
you may need to use /dev/diskid/DISK-WD-WCAYU7656378p1 like device names if /dev/ada* are missing
just replace adaNp1 with corresponding diskid/DISK-WD-WCAYU7656378p1
 
dd if=/dev/ada4p1 of=/dev/ada3p1 bs=64k
if=source
of=destination
bs=block size (default is 512b which is slow)
you may need to use /dev/diskid/DISK-WD-WCAYU7656378p1 like device names if /dev/ada* are missing
just replace adaNp1 with corresponding diskid/DISK-WD-WCAYU7656378p1
Thank you. Seems to have worked. I'll try rebooting once my current backup completes. I'm restarting for third time now that I learned about how to use more than one core for tar compression by including --use-compress-program=pbzip2. Trying to backup 1 TB only using one core for compression is painfully slow (not finished after days)... :)

Code:
 # dd if=/dev/ada4p1 of=/dev/ada3p1 bs=64k
dd: /dev/ada4p1: No such file or directory
   root@vm pts/1 14:45:58 Thu Jan 06 /backup/zroot/
 # dd if=/dev/diskid/DISK-WD-WCAYU7656378p1 of=/dev/diskid/DISK-WD-WCAYU7656378p1 bs=64k
1+0 records in
1+0 records out
65536 bytes transferred in 0.034203 secs (1916067 bytes/sec)
 
thats with identical source and target
look in dmesg for serial numbers of the disks
grep ada.*Serial /var/run/dmesg.boot

ada0: Serial Number WD-WCC3F6YDZYFS
ada1: Serial Number WD-WCC3F1VC5HC3
then use that strings in dd
dont forget to add p1 at the end
also add count=1 to dd so if you forget p1 it will fsck just 64k not an entire disk
 
thats with identical source and target
look in dmesg for serial numbers of the disks
grep ada.*Serial /var/run/dmesg.boot

ada0: Serial Number WD-WCC3F6YDZYFS
ada1: Serial Number WD-WCC3F1VC5HC3
then use that strings in dd
dont forget to add p1 at the end
also add count=1 to dd so if you forget p1 it will fsck just 64k not an entire disk
Whoops. Thanks for catching that. Let me try again.

Seems to have worked:
Code:
 # dd if=/dev/diskid/DISK-9QMBTYTMp1 of=/dev/diskid/DISK-WD-WCAYU7656378p1 bs=64k count=1
1+0 records in
1+0 records out
65536 bytes transferred in 0.501115 secs (130780 bytes/sec)
 
Code:
[user@host ~]$ strings -n20  /boot/gptzfsboot |head && ls -l /boot/gptzfsboot
Invalid VM86 Request
%s: No ZFS pools located, can't boot
failed to clear pad2 area of primary vdev
failed to parse pad2 area of primary vdev
failed to read pad2 area of primary vdev
%s: failed to mount default pool %s
Default: %s/<0x%llx>:%s
Can't find ZFS pool %s
Can't find dataset %s in ZFS pool %s
Can't mount ZFS dataset


-r--r--r--  1 root  wheel  43251 Jan  3  2019 /boot/gptzfsboot
I think the light just went on. Does the fact that strings returns that output when run on the individual partitions suggest the drive has boot code on it? Because the data on it returns those strings? Am I finally understanding?

So for example this output suggests ada4p1 contains boot code?
Code:
 # geom label status
                                      Name  Status  Components
               diskid/DISK-WD-WMC130D0992S     N/A  ada0
               diskid/DISK-WD-WCAYU7656378     N/A  ada1
                      diskid/DISK-9QG6AAHZ     N/A  ada2
                      diskid/DISK-6QG2X7H9     N/A  ada3
                      diskid/DISK-9QMBTYTM     N/A  ada4
gptid/415e2ffb-388a-11e1-b28e-001fc6271fcd     N/A  diskid/DISK-WD-WCAYU7656378p1
gptid/67c184d7-6d9a-11ec-9dee-001fc6271fcd     N/A  diskid/DISK-6QG2X7H9p1

 # strings -n20 /dev/diskid/DISK-9QMBTYTMp1 | head
Invalid VM86 Request
ZFS: zfs_alloc()/zfs_free() mismatch
ZFS: out of temporary buffer space
ZFS: unsupported compression algorithm %u
ZFS: unsupported compression algorithm %s
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS object directory
ZFS: can't find root filesystem
ZFS: can't read object set for dataset %ju
ZFS: can't open root filesystem
Thanks for all of the help. I really appreciate it.
 
Does the fact that strings returns that output when run on the individual partitions suggest the drive has boot code on it? Because the data on it returns those strings?
Yes, that's correct. A empty freebsd-boot partition returns nothing.

I've read the man page for strings and searched for a tutorial on how to use strings to check for bootcode. I didn't find anything specific but it seems I need to send strings a file name. So does that mean I need to figure out how to mount the boot partition and check a specific file name?
No. Explaining files on Unix:


So, in the command strings /dev/ada4p1 the device node ada4p1 would be the file specified to strings(1).
 
Well I seem to have broken my system. After following advice above and getting boot code on my other drives, I removed the failing 500 GB drive which was ada4 or diskid/DISK-9QMBTYTM. I replaced it with a 1 TB drive, partitioned with a 512k freebsd-boot partition and the rest of the drive allocated to freebsd-zfs. Next, I issued the replace command and the resilver began.

I know it got to at least 16% based upon output from 'zpool status'. After maybe 30 minutes went by, I used 'zpool status' to see how things were progressing. At that point, my terminal session just froze. I opened another session and tried again with the same result. I also noticed my drive light on the box was no longer on. Next I opened another terminal session and attempted to 'kill' and 'kill -9' the two zpool processes I had running but no effect. Next I tried rebooting the box with the 'shutdown' command. It nearly completed but just after the last syncing messages, it never quite finished. At that point I power cycled the box and rebooted. Now I'm stuck at a 'mountroot' prompt and if I issue 'zfs:zroot', it fails with an 'error 5'.

I did some searching and found this thread that suggested I could repair it by booting from another source. I created a USB stick from an image and tried that. However, I can't import the pool to scrub it. I get some sort of 'I/O error' that has since scrolled off my screen.

Thinking maybe by reinstalling my 500 GB failing ada4, it would boot but I get the same result. Stuck at the 'mountroot' prompt.

So I don't know if this a result of attempting to use a 1 TB drive where the other 3 drives are 500 GB or if it's just some coincidental bad luck. It seems to me that I should have 3 good drives in my 4 drive raidz1 and should still be able to recover. I'd appreciate any suggestions as I'd rather recover than try and collect everything from backups and restore.

Thanks,

Drew
 
Back
Top