Solved Boot from ZFS root on raw disks?

vanessa · Feb 24, 2015

I stumbled upon this post and am trying to figure out how to place a boot loader (and which one) to a (mirrored) zpool consisting of two raw disks.

Does anybody have a clue?

woodsb02 · Feb 24, 2015

I don't think there is any discernible advantage to giving ZFS the entire disk on FreeBSD. If you are willing to use GPT partitioning, the FreeBSD 10.1 installer can do this for you. Alternatively (still using GPT partitioning), you can follow this guide (just extract the bootcode bits you need if you already have FreeBSD installed into the ZFS datasets):
https://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror

vanessa · Feb 24, 2015

I know about the advantages and disadvantages of partitioning vs. raw ZFS. And I surely know how to setup ZFS. But that was not the question.

gkontos · Feb 24, 2015

The bootloader needs to be placed on both disks in order to be able to boot. As far as I know you need to either use some GPT or MBR (not recommended) method.

vanessa · Feb 24, 2015

Well, I expect this to be /boot/zfsloader and to be written with dd to the (maybe) first sector of the raw disks. However I've never tried playing with raw disks and ZFS, so I hope that someone who has could share some light on this.

And no, you need neither GPT nor MBR nor any other partitioning scheme as I am asking about raw disks.

phoenix · Feb 24, 2015

vanessa said:
I stumbled upon this post and am trying to figure out how to place a boot loader (and which one) to a (mirrored) zpool consisting of two raw disks.

Does anybody have a clue?

Break the mirror to remove one of the drives from the pool.
Re-format that drive with 3 GPT partitions (freebsd-boot, freebsd-zfs, freebsd-swap).
Install the gptzfsboot to the first partition (freebsd-boot).
Add the swap partition to /etc/fstab and enable it.
Add the freebsd-zfs partition to the mirror.
Wait for it to resilver.
Repeat with the other drive.

Voila! A much nicer setup that's a lot easier to manage.

Depending on how you have it configured to boot from ZFS, you may need to modify the bootfs property of the pool, or /boot/loader.conf

vanessa · Feb 24, 2015

Thank you phoenix, but I am already aware of setting up the "optimal" ZFS system, as I am managing two servers with the exactly same configuration you are recommending. The point is that I would like to learn something new.

free-and-bsd · Feb 24, 2015

Well the mailing list you're referring to says:

Code:

No, ZFS support booting from dedicated disks. There's a zfsboot file
that should be written upon disk first sector and ZFS reservation space
to make it bootable.

So why won't you try to write /boot/zfsboot "upon disk first sector" using dd(1) and report back how it works? man zfsboot says this:

Code:

 zfsboot is typically installed using dd(1).  To install zfsboot on the
  ada0 drive:

  dd if=/boot/zfsboot of=/dev/ada0 count=1
  dd if=/boot/zfsboot of=/dev/ada0 iseek=1 oseek=1024

I've got so curious and I use swap inside zpool anyway, maybe I'll try it myself

vanessa · Feb 24, 2015

Thanks !!!

free-and-bsd said:
dd if=/boot/zfsboot of=/dev/ada0 count=1
dd if=/boot/zfsboot of=/dev/ada0 iseek=1 oseek=1024

I never knew there was a zfsboot(8) man page

iseek and oseek are what I was seeking for

I'll try this tomorrow. If you will be first, please drop a line how it worked.

free-and-bsd · Feb 24, 2015

Neither did I, but your post made me wonder if there was.

ondra_knezour · Feb 25, 2015

This is also mentioned on wiki, see https://wiki.freebsd.org/RootOnZFS/ZFSBootPartition 1.10 - Install ZFS boot

Export zroot, before installing boot code
Fixit# zpool export zroot
Install the boot1 stage:
Fixit# dd if=/mnt2/boot/zfsboot of=/tmp/zfsboot1 count=1
Fixit# gpart bootcode -b /tmp/zfsboot1 /dev/ad0s3
This may fail with an "operation not permitted" error message, since the kernel likes to protect critical parts of the disk. If this happens for you, run:
Fixit# sysctl kern.geom.debugflags=0x10
Install the boot2 zfs stage into the convienient hole in the ZFS filesystem on-disk format which is located just after the ZFS metadata (this is the seek=1024).
Fixit# dd if=/mnt2/boot/zfsboot of=/dev/ad0s3a skip=1 seek=1024
Import zroot to continue the install
Fixit# zpool import zroot

free-and-bsd · Feb 25, 2015

woodsb02 said:
I don't think there is any discernible advantage to giving ZFS the entire disk on FreeBSD...

They mention some in this document.

free-and-bsd · Feb 25, 2015

Hey, it works like charm!! Occupies less space than with partitioned setup, and it even seems ZFS is quicker this way.

I'm running it now on my laptop, partitioned setup on 1Tb 4k-aligned disk -- tested against the old 149G HDD, NOT-4k-aligned, but with raw disk zpool setup. The thing seems to work smoother -- well at least the same.

I just wonder: will a zpool thus created understand 4k-alignment of the underlying hardware? Will test it now, but I guess it should.
EDIT: one has to follow the standard procedure with gnop.

woodsb02 · Feb 25, 2015

free-and-bsd said:
They mention some in this document.

The main argument for using whole disks with ZFS on Solaris was to allow ZFS to use the disk cache. This does not apply on FreeBSD, as the FreeBSD implementation is able to use the disk cache even with partitions.
See this mailing list thread:
https://lists.freebsd.org/pipermail/freebsd-questions/2013-January/248701.html

There is still one argument for using whole disks that is valid: replacing a disk is simpler and quicker, not having to worry about setting up the partitions etc.

However, the main argument for using partitions for ZFS is about when it comes to replace a failed disk if you have a mirrored or raidz setup. The new disk needs to be the same size or bigger, but the problem is that not all drives stated as the same size (e.g. 4TB) have exactly the same number of sectors. If you by a replacement 4TB drive and it has 1 less sector, it can not be used to replace the failed disk if your ZFS setup is using the whole disk. That is why I create a big ZFS partition and a small swap partition at the end of the disk (you can vary the size of the small swap partition at the end of the disk if your new replacement disk is a bit smaller).
https://lists.freebsd.org/pipermail/freebsd-questions/2013-January/248706.html

free-and-bsd · Feb 25, 2015

OK thanks, I remember now that I read something to this effect when I was setting up my zfs-mirrored FreeNAS-based file server for my office, and so followed the advise.
But this time I'm experimenting with my laptop and this stuff isn't that much of an issue. More or less just for the fun of it, you know.

vanessa · Feb 26, 2015

I can also confirm now - it ®justworks

Here is the why:
1. Simplicity. You get rid of a whole complexity layer you don't have to manage - GPT!
2. Speed. There is no unified opinion here as whether a raw device is always faster. However one thing is sure: it can NOT be slower.

If you have a server hosted at some provider, and a disk fails, you can not send a wishlist containing one or another HDD model - you just get a replacement disk or a couple of them and basta. So the argument of replacing disks is not always important. If you have recent snapshots offsite, just get new disks and restore with very little downtime.

free-and-bsd: with a recent version of FreeBSD you don't need gnop anymore. Just set:
sysctl vfs.zfs.min_auto_ashift=12
before creating your pool and it will be created with 4k sector size.

phoenix · Feb 26, 2015

woodsb02 said:
However, the main argument for using partitions for ZFS is about when it comes to replace a failed disk if you have a mirrored or raidz setup. The new disk needs to be the same size or bigger, but the problem is that not all drives stated as the same size (e.g. 4TB) have exactly the same number of sectors. If you by a replacement 4TB drive and it has 1 less sector, it can not be used to replace the failed disk if your ZFS setup is using the whole disk. That is why I create a big ZFS partition and a small swap partition at the end of the disk (you can vary the size of the small swap partition at the end of the disk if your new replacement disk is a bit smaller).
https://lists.freebsd.org/pipermail/freebsd-questions/2013-January/248706.html

One of the pre-v28 versions of ZFS included the "feature" where it ignored (up to) the last MB of the disk (or something along those lines), to mitigate this issue. So long as the two drives are close in number of physical sectors, they will work without any issues.

My main reason for using partitions is GPT labelling. GPT labels work much more reliably than disk labels via glabel(8), and it's much nicer seeing labels in zpool(1) output. Especially in systems with dozens of drives! It's also useful in systems with fewer disks. And, in pure-ZFS setups (like my home server) where the OS is installed into the pool, using partitions gives you swap space that's not managed by (and messing with) ZFS.

free-and-bsd · Feb 28, 2015

vanessa said:
I can also confirm now - it ®justworks

Here is the why:
1. Simplicity. You get rid of a whole complexity layer you don't have to manage - GPT!
2. Speed. There is no unified opinion here as whether a raw device is always faster. However one thing is sure: it can NOT be slower.
...

vanessa, I actually love GPT and personally see no reason why so many prefer MBR model. The former beats the later on all counts... unless it be necessity to use some obsolete OSes.

But then, I love ZFS as well and am willing to use all it has to offer. Using whole disk is one of such things and Solaris documentation recommends using whole disks for root pools. And your second point seems obvious as well: this setup cannot possibly be slower, and we know what it means.

There's only one thing missing in all this picture: I haven't yet found out how I could boot Linux

in this setup. Grub2 seems to be needing a dedicated partition, so whilst I keep my Linux installation in the same rpool with FreeBSD, in that case I have to use GPT-partitioned setup with bios-boot partition to host GRUB2 image. The problem is not only that the image needs to be embedded into bios-boot partition, but I'm yet to check if it will be able to distinguish a whole-disk ZFS pool at all.

This, however, serves only testing purposes, for though I have Linux installed, I never use it.

free-and-bsd · Feb 28, 2015

free-and-bsd said:
...
There's only one thing missing in all this picture: I haven't yet found out how I could boot Linux in this setup. Grub2 seems to be needing a dedicated partition, so whilst I keep my Linux installation in the same rpool with FreeBSD, in that case I have to use GPT-partitioned setup with bios-boot partition to host GRUB2 image. The problem is not only that the image needs to be embedded into bios-boot partition, but I'm yet to check if it will be able to distinguish a whole-disk ZFS pool at all.
...

Surprise surprise!
This does work like charm. But I'll report this in a separate post.