BIOS Booting a ZFS Root on an MBR Partition

WARNING:
YOU WILL LOSE ALL YOUR DATA.
Following this guide will destroy all data on your disks. Make sure you have backups. This will not work on 2 TB or larger boot drives, of course. That's the MBR partition limit.

Rationale​

I have two machines that duel boot Freebsd; one duels with Linux, the other duels with Windows 10. These were 12.4 machines I set up with UFS root filesystems back before 13.x was released, and when I did not have much experience with ZFS. I wanted to update them to 13.2. This seemed like a good time to rebuild them with ZFS root filesystems. I want that for snapshots, boot environments, better poudriere(8) performance, etc. I want BIOS booting from MBR partitions because I know I can make the simple boot0 boot loader work in such a setup. Also, GRUB needs a not-so-special 1 MB partition at the beginning of the disk to BIOS boot from a GPT partition. I'm not a fan of small "special" partitions.

Setup​

Each system has a 1TB NVMe drive as the boot drive, and some large spinning rust formatted with NTFS mainly to store my Steam library. The latter is irrelevant for this guide and won't be mentioned again. We'll split each boot drive in half for Freebsd, with the alternative operating system installed on the other half. A one-terabyte drive has 932 gibibytes in it. This works out to a 466 gibibyte Freebsd slice inside of which we'll create a 434 gibibyte ZFS partition. The remaining 32 gibibytes will be allocated to a swap partition.

Problem​

The Freebsd installer only allows one to use the whole disk for ZFS on root, or to drop to a shell and do manual partitioning. There's no guided fractional partitioning option like there is for UFS. I found a guide on the wiki for doing ZFS Root on BIOS/MBR systems, and a similar one for GPT partitioning. Neither exactly suited what I wanted to achieve so I pieced together an approach using those wiki pages and the bsdinstall(8) zfsboot script. It was complicated and took some time. I'm sharing it here in case anyone else finds it useful, and so I can find it when I need to do this again.

Steps​

  1. Boot into the Freebsd installer
  2. Choose "Shell" at the "Partitioning" dialog
  3. Find your installation disk gpart show
  4. Destroy any existing partitions on it gpart destroy -F foo0
  5. Create MBR partitioning scheme gpart create -s mbr foo0
  6. Create a freebsd slice on the new partitioning scheme
    Code:
    # gpart add -s 466G -t freebsd foo0
    foo0s1 added
  7. Destroy any old ZFS labels that may linger on the disk1
    Code:
    # zpool labelclear -f foo0
    # zpool labelclear -f foo0s1
  8. Destroy any old BSD labels that may linger1 gpart destroy -F foo0s1
  9. Create the slice
    Code:
    # gpart create -s BSD foo0s1
    foo0s1 created
  10. Set the first slice active gpart set -a active -i 1 foo0
  11. Add the ZFS partition first2
    Code:
    # gpart add -s 434G -t freebsd-zfs foo0s1
    foo0s1a added
  12. Then add the swap partition
    Code:
    # gpart add -t freebsd-swap foo0s1
    foo0s1b added
  13. Install the boot manager
    Code:
    # gpart bootcode -b /boot/boot0 foo0
    bootcode written to foo0
  14. Mount a tmpfs to /mnt mount -t tmpfs tmpfs /mnt
  15. Create the ZFS root pool
    Code:
    # zpool create -o altroot=/mnt -O compress=lz4 -O atime=off zroot foo0s1a
  16. Export zroot before installing boot code zpool export zroot
  17. Install the boot1 stage
    Code:
    # dd if=/boot/zfsboot of=/tmp/zfsboot1 count=1
    # gpart bootcode -b /tmp/zfsboot1 /dev/foo0s1
      bootcode written to foo0s1
  18. Install the boot2 ZFS stage into the convenient hole in the ZFS filesystem on-disk format which is located just after the ZFS metadata (this is the seek=1024)
    Code:
    # dd if=/boot/zfsboot of=/dev/foo0s1a skip=1 seek=1024
  19. Import zroot to continue the install zpool import -o altroot=/mnt zroot
  20. Create the ZFS datasets
    Code:
    zfs create -o mountpoint=none zroot/ROOT
    zfs create -o mountpoint=/ zroot/ROOT/default
    zfs create -o mountpoint=/tmp -o exec=on -o setuid=off zroot/tmp
    zfs create -o mountpoint=/usr -o canmount=off zroot/usr
    zfs create zroot/usr/home
    zfs create -o setuid=off zroot/usr/ports
    zfs create -o mountpoint=/var -o canmount=off zroot/var
    zfs create -o exec=off -o setuid=off zroot/var/audit
    zfs create -o exec=off -o setuid=off zroot/var/crash
    zfs create -o exec=off -o setuid=off zroot/var/log
    zfs create -o atime=on zroot/var/mail
    zfs create -o setuid=off zroot/var/tmp
  21. Create /home link and adjust permissions
    Code:
    # ln -s /usr/home /mnt/home
    # chmod 1777 /mnt/var/tmp
    # chmod 1777 /mnt/tmp
  22. Configure boot environment
    Code:
    # zpool set bootfs=zroot/ROOT/default zroot
    # zfs set canmount=noauto zroot/ROOT/default
  23. Edit fstab (vi /tmp/bsdinstall_etc/fstab) and add
    Code:
    # Device                       Mountpoint              FStype  Options         Dump    Pass
    /dev/foo0s1b                   none                    swap    sw              0       0
  24. Exit shell partitioning mode, bsdinstall will continue and complete the installation
  25. In the post installation step, when asked if you would like to drop into the installed system, choose yes and run
    Code:
    # sysrc zfs_enable="YES"

Notes​

1 I'm not sure these steps are necessary. The bsdinstall zfsboot script performs these cleanups, but calls them "pedantic." I had weird problems that may have been caused by stale ZFS labels.

2 The MBR root on ZFS wiki page insists that the ZFS partition go first. The bsdinstall zfsboot script always creates a separate boot pool when BIOS booting from an MBR partition. I found the latter unnecessary, and am unsure that the former is strictly needed.

* After reading this I decided to trim and enable TRIM on these root pools because they are on SSDs
Code:
# zpool trim zroot
# zpool set autotrim=on
 
Last edited:
I can make the simple boot0 boot loader work in such a setup. I read somewhere that GRUB has difficulty booting Freebsd from GPT partitions as well.
I've yet to see GRUB issues with GPT partitions. If you can share the issue though I'd be interested.
MBR bootcode is not simpler than GPT one. Actually GPT simplifies few things and makes the administration steps easier.

Now FreeBSD supports both and both are valid ways. I'm not here to convince you what to use, to each their own. :)
 
(As this doesn't seem to be a howto (wrong forum section?)) Only UEFI/GPT/whole-disk-ZFS-on-root for me nowadays; don't see any real reason to suggest using MBR today (not going to argue here though :) ).
 
Do you need to run boot0cfg(8) with this setup, or are the boot0 defaults right?
I've never had to run boot0cfg(8). I'm in awe of whomever wrote boot0. It correctly detects all bootable drives on my system automatically, and boots windows and Freebsd with absolutely no config. All in 512 bytes!

You forgot zfs set canmount=noauto zroot/ROOT/default. You'll have trouble booting BEs.
Indeed I did. Thank you! Added. (And done manually on my workstations.)

(As this doesn't seem to be a howto (wrong forum section?))
It seemed too specific to be a generally-useful HOWTO. Most people go the UEFI/GPT route nowadays, including you.
 
It seemed too specific to be a generally-useful HOWTO.
It is well written HOWTO actually. I wouldn't comment myself it this was in HOWTO section as we try to avoid generic chat under there.
I'd say it's worth moving to that section. Site admin to decide I guess.
 
It seemed too specific to be a generally-useful HOWTO.
There are certainly use cases for this (my reply was meant to say "if you are able to use uefi/gpt, I don't see why you would choose to use MBR") and can be useful as howto, it just needs to clearly say WHEN it should be used.
 
I've never had to run boot0cfg(8). I'm in awe of whomever wrote boot0.

Robert Nordier, refined by bms@, later luigi@ added deeper documentation.

src/stand/i386/boot0/boot0.S on 12.4 anyway.

Very clever asm for sure. IBM's systems programming course taught that "a line of asm without a comment is an error" (ono) and this code well conforms to that.

It correctly detects all bootable drives on my system automatically, and boots windows and Freebsd with absolutely no config. All in 512 bytes!

I've only ever installed it via boot0cfg, more lately having bought a refurb T430s with Win10 using 2(!) MBR slices, surprisingly easy to shrink to make room for FreeBSD.

This is one reason people might find your tute useful (not that mine had room for ZFS, but I admire the work).

It seemed too specific to be a generally-useful HOWTO. Most people go the UEFI/GPT route nowadays, including you.

Good to have warriors out the front on latest gear, but important not to dump docs for some older and slower people happily using older kit that's still working well.

There will be BIOS/MBR systems around for years, and there's no need to deprecate these when promoting newer systems.

What do you think about adding this method as a proper alternative choice in bsdinstall?
 
What do you think about adding this method as a proper alternative choice in bsdinstall?
Thank you for your kind words. Adding this to zfsboot would be a lot of work. That script is a particularly hairy piece of shell. It even uses one of my favourite shell misfeatures, dynamic scope:
 
Adding this to zfsboot would be a lot of work. That script is a particularly hairy piece of shell.

So is most of bsdconfig, also primarily by Devin Teske. The use of sh(1) is masterful but challenging to follow at first.

Agreed, coding your method to that style would be heavy going, and maybe wouldn't get past GPT-only advocates anyway.

OTOH, a pointer to this (as a HOWTO) in the Handbook section should alert people to it as an alternative option?

It even uses one of my favourite shell misfeatures, dynamic scope:

Try again with that URL? Did some googl'n but not sure what it means for this code?


Maybe one day I'll feel more comfortable about relying on Microsoft to keep our code repo safe longterm, but I guess if core@ is happy ...
 
I read somewhere that GRUB has difficulty booting Freebsd from GPT partitions as well.
If true, that must be a earlier version of GRUB. I'm using GRUB over a decade to dual boot FreeBSD and Linux from GPT schemed partitions (BIOS and UEFI) and never had a problem.

The Freebsd installer only allows one to use the whole disk for ZFS on root, or to drop to a shell and do manual partitioning. There's no guided fractional partitioning option like there is for UFS.
There is a zfsboot script proposal on reviews.freebsd.org I found years ago, which allows a partial installation.

It was improved by Vull (download here and replace the one on the installation media).

I've noticed you didn't specified the sector size when creating the pool (ashift property). The FreeBSD automated, guided installation sets ashift=12.
 
It's also maybe worth pointing out one misconception regarding GPT I've seen before (not saying it was assumed here): older computers can't use GPT and need to stick to MBR. That's not true. Any computer that can boot MBR will be able to boot from GPT as this initial step (lba0 loaded by firmware/bios) is identical. It's up to that bootcode then to decide what to do, i.e. OS dependent. If OS can boot GPT it's ok.

Frankly I'm very thankful to GPT I don't need to care about bsdlabel(8) and headaches to expand FS it required.
 
Try again with that URL? Did some googl'n but not sure what it means for this code?
Argh, I wonder if this is a bug in the forums. I saw another link like this elsewhere. Here's the real link:

The comments
# NOTE: $swapsize and $bootsize should be defined by the calling function.
# NOTE: Sets $bootpart and $targetpart for the calling function.
Suggest the use of dynamic scope.

Maybe one day I'll feel more comfortable about relying on Microsoft to keep our code repo safe longterm, but I guess if core@ is happy ...
It's just a read-only mirror. The authoritative repo is at https://cgit.freebsd.org/. I always use Github links, 'cause I figure Micro$oft can pay for my traffic.

I've noticed you didn't specified the sector size when creating the pool (ashift property). The FreeBSD automated, guided installation sets ashift=12.
Yeah, I was unsure about that. It's an option that's on by default:

But I seem to recall reading somewhere that 4K sectors are the default now. I don't mess with things I don't fully understand since I'm still a newbie when it comes to ZFS.
 
Argh, I wonder if this is a bug in the forums. I saw another link like this elsewhere. Here's the real link:

Thanks.

The comments

Suggest the use of dynamic scope.

Strangely, the comments in the unlabeled box in your post were not included in the quote here. How did you do that?

But I get the point. It gets even weirder, if you look at the techniques used in e.g:
/usr/src/usr.sbin/bsdconfig/share/common.subr
functions like setvar() and hairier, f_eval_catch() etc ...

Great fun ploughing through debug log files, many over (edit: 200MBkB - some over 1MB) for each session of 'bsdconfig packages'. I had to write filter scripts to omit >90% of irrelevant data to find two real bugs.

It's just a read-only mirror. The authoritative repo is at https://cgit.freebsd.org/. I always use Github links, 'cause I figure Micro$oft can pay for my traffic.

Thankyou for that. M$ owe the world trillions IMNSHO.
 
Frankly I'm very thankful to GPT I don't need to care about bsdlabel(8) and headaches to expand FS it required.

I recently had to expand both a slice and a contained FS partition, and managed both using gpart(8); I recall Warren Block saying nobody needed to brave bsdlabel ever again, ono.

It helps using 'gpart show -p' to disambiguate slices from partitions grown using -i N
 
I've noticed you didn't specified the sector size when creating the pool (ashift property). The FreeBSD automated, guided installation sets ashift=12.
I read more on this. The default value is 0 meaning that zfs should auto-detect the sector size. I have four ZFS pools, all created with auto-detected ashifts. Three of them are on single NVMe SSDs, and the fourth is a RAIDZ2 set over six spinny disks. I was curious about what ashift values were autodetected. This is what I found.

Two of the SSDs are Corsair P5 1TB sticks. They both report:
Code:
# nvmecontrol identify nvme0ns1
...
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Best
ZFS chose an ashift of 9 for these two:
Code:
# zdb -l nvd0s1 | grep ashift
        ashift: 9

The third SSD pool is built on whatever WD Black SSD came with my Frame.work laptop:
Code:
# nvmecontrol identify nvme0ns1
...
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Good
LBA Format #01: Data Size:  4096  Metadata Size:     0  Performance: Better
ZFS appears to have done the Right ThingTM:
Code:
# zdb -l nvd0 | grep ashift
        ashift: 12

Reading what ralphbsz wrote here makes me think setting ashift is probably not that crucial for SSDs anyway. The spinning rust reports:
Code:
# smartctl -a /dev/da0 | grep -i sector
Sector Sizes:     512 bytes logical, 4096 bytes physical
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
And once again ZFS has done the Right ThingTM:
Code:
 # zdb -l /dev/da0 | grep ashift
        ashift: 12
        ashift: 12
        ashift: 12
        ashift: 12

I guess in the generic case setting ashift=12 will do no harm at worst, and is likely to help.
 
I recently had to expand both a slice and a contained FS partition, and managed both using gpart(8); I recall Warren Block saying nobody needed to brave bsdlabel ever again, ono.
I've been using MBR layout before gpart existed. It was not an option before.

It helps using 'gpart show -p' to disambiguate slices from partitions grown using -i N
I personally don't have problem distinguising what is what; but the bsdlabel magic, that was always a bit of "fun" I could have lived without. I probably never extended MBR slice/partition using gpart because there was no need.

I moved to GPT probably at the same time as gpart was introduced. I do work with MBR occasionally but usually it's on systems where no expansion is expected at all and root is on a single slice covering all system needs (typical VM with UFS on /; although this is very rare in my setup nowadays too).

But again, to each their own.
 
I've yet to see GRUB issues with GPT partitions. If you can share the issue though I'd be interested.
MBR bootcode is not simpler than GPT one. Actually GPT simplifies few things and makes the administration steps easier.
I found the reason why I stuck with MBR partitions:

Sure, 1MB is nothing, but I don't like having all these little partitions all over the place. I find it's easy to forget why you made them in the first place and that can lead to unfortunate decision-making.

I also didn't like this bit
...GRUB will require a special partition to boot properly. This partition should be at the beginning of your disk...
I made the Freebsd slice first, because my systems are primarily Freebsd systems. The secondary OS is semi-disposable.

I wound up booting Void with Syslinux in the end, so maybe I could've used GPT partitions after all. Oh well. I'm happy with the setup now.
 
Sure, 1MB is nothing, but I don't like having all these little partitions all over the place
Indeed one may lose more space with the proper alignment of the partitions where it matters.
Partitions are not all over the place - they are defined properly at two locations (one being backup) and are not in anybody's way.

I find it's easy to forget why you made them in the first place and that can lead to unfortunate decision-making.

With legacy boot you need next bootloader stage. FreeBSD uses its partition and it's pretty straight forward: you know where to expect bootloader code. Linux/grub has more options but in principle it's the same.
On a contrary due to unfortunate decision making I had to restore several Linux MBR boot boxes due to: a) 256MB (or even less!) /boot being enough back in the day b) boot started offset 40 not leaving enough free space for bootloader that got bigger over time.

With GPT bootloader code location is properly defined - on its own partition. There's no confusion. There's also benefit you can set disk that is capable of booting either legacy or UEFI. Of course not at the same time. :).

In the end it's up to you what you chose and what you require.
 
My current setup is :
[I dont' use a freebsd-boot-partition]
- I use linux-grub to chain-load the freebsd-boot-loader on a separate UFS partition.
- In the loader.conf i specify the root directory & kernel on a separate ZFS partition.
 
On a contrary due to unfortunate decision making I had to restore several Linux MBR boot boxes due to: a) 256MB (or even less!) /boot being enough back in the day b) boot started offset 40 not leaving enough free space for bootloader that got bigger over time.
You're making my point for me. I don't have any boot or any other kind of small partitions or slices anywhere anymore. There's one ZFS partition and one swap partition for Freebsd, they don't have to be at the beginning of the disk or anywhere else in particular. There's also a partition for the secondary OS, but since that install is semi-disposable, I know it's safe to nuke it.

I expect it's just a matter of time until the UEFI bloatware starts to be too big for the partitions folks have made for it. Same problem, different day.
 
You're making my point for me. I don't have any boot or any other kind of small partitions or slices anywhere anymore
I was commenting on Linux side of things as you mentioned void Linux link above. /boot partition is typical for Linux setups, was done as separate partition for certain reasons (e.g. / on LVM, encryption..)

I expect it's just a matter of time until the UEFI bloatware starts to be too big for the partitions folks have made for it
When bootloader is getting bigger it doesn't matter if you have it wildly in a gap between partitions, at start of a partition or on separate partition. Some sort of disk reorganization needs to happen if it's small. Argument here is: bootloader resides on its own partition providing clear setup where it is (and this stands out more on Linux due to reasons above).

This was true for legacy boot code on FreeBSD: had few boxes I installed around 2009 and kept rolling the upgrades up. At some point legacy boot code got bigger and didn't fit.

With GPT scheme managing disks is simpler. Aligning partitions is straightforward. Overall clean and simple. GPT doesn't introduce any problem, it solves the mess with the 4-partition limitation.

However you are admin of your servers, use what you like. FreeBSD supports both.
 
Note : gpart "should" align by default as disk space is cheap. Does it align by default or do you need to give a parameter ? Align to which block-space ? 4K?
 
Back
Top