Root On ZFS @ FreeBSD 9

In this guide I will demonstrate how you can install a fully functional full root on ZFS FreeBSD9 using a GPT scheme with a non legacy root ZFS mountpoint optimized for 4K drives. We will also use ZFS for SWAP.

You can use this as a reference guide for a single or mirror installation.

(1) Boot from a FreeBSD9 installation DVD or memstick and choose "Live CD".

(2) Create the necessary partitions on the disk(s) and add ZFS aware boot code.

a) For a single disk installation.
Code:
gpart create -s gpt ada0
gpart add -b 34 -s 94 -t freebsd-boot ada0
gpart add -t freebsd-zfs -l disk0 ada0
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
b) Repeat the procedure for the second drive if you want a mirror installation.
Code:
gpart create -s gpt ada1
gpart add -b 34 -s 94 -t freebsd-boot ada1
gpart add -t freebsd-zfs -l disk1 ada1
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1

(3) Align the Disks for 4K and create the pool.

a) For a single disk installation.
Code:
gnop create -S 4096 /dev/gpt/disk0
zpool create -o altroot=/mnt -o cachefile=/var/tmp/zpool.cache zroot /dev/gpt/disk0.nop
zpool export zroot
gnop destroy /dev/gpt/disk0.nop
zpool import -o altroot=/mnt -o cachefile=/var/tmp/zpool.cache zroot
b) For a mirror installation.
Code:
gnop create -S 4096 /dev/gpt/disk0
gnop create -S 4096 /dev/gpt/disk1
zpool create -o altroot=/mnt -o cachefile=/var/tmp/zpool.cache zroot mirror /dev/gpt/disk0.nop /dev/gpt/disk1.nop
zpool export zroot
gnop destroy /dev/gpt/disk0.nop
gnop destroy /dev/gpt/disk1.nop
zpool import -o altroot=/mnt -o cachefile=/var/tmp/zpool.cache zroot

(4) Set the bootfs property and checksums.

Code:
zpool set bootfs=zroot zroot
zfs set checksum=fletcher4 zroot

(5) Create appropriate filesystems (feel free to improvise!).

Code:
zfs create zroot/usr
zfs create zroot/usr/home
zfs create zroot/var
zfs create -o compression=on -o exec=on -o setuid=off zroot/tmp
zfs create -o compression=lzjb -o setuid=off zroot/usr/ports
zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/distfiles
zfs create -o compression=off -o exec=off -o setuid=off zroot/usr/ports/packages
zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/usr/src
zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/var/crash
zfs create -o exec=off -o setuid=off zroot/var/db
zfs create -o compression=lzjb -o exec=on -o setuid=off zroot/var/db/pkg
zfs create -o exec=off -o setuid=off zroot/var/empty
zfs create -o compression=lzjb -o exec=off -o setuid=off zroot/var/log
zfs create -o compression=gzip -o exec=off -o setuid=off zroot/var/mail
zfs create -o exec=off -o setuid=off zroot/var/run
zfs create -o compression=lzjb -o exec=on -o setuid=off zroot/var/tmp

(6) Add swap space and disable checksums. In this case I add 4GB of swap.

Code:
zfs create -V 4G zroot/swap
zfs set org.freebsd:swap=on zroot/swap
zfs set checksum=off zroot/swap

(7) Create a symlink to /home and fix some permissions.

Code:
chmod 1777 /mnt/tmp
cd /mnt ; ln -s usr/home home
chmod 1777 /mnt/var/tmp

(8) Instal FreeBSD.

Code:
sh
cd /usr/freebsd-dist
export DESTDIR=/mnt
for file in base.txz lib32.txz kernel.txz doc.txz ports.txz src.txz;
do (cat $file | tar --unlink -xpJf - -C ${DESTDIR:-/}); done

(9) Copy zpool.cache (very important!!!)

Code:
cp /var/tmp/zpool.cache /mnt/boot/zfs/zpool.cache

(10) Create the rc.conf, loader.conf and an empty fstab (otherwise the system will complain).

Code:
echo 'zfs_enable="YES"' >> /mnt/etc/rc.conf
echo 'zfs_load="YES"' >> /mnt/boot/loader.conf
echo 'vfs.root.mountfrom="zfs:zroot"' >> /mnt/boot/loader.conf
touch /mnt/etc/fstab

Reboot, adjust time zone info, add a password for root, add a user and enjoy!!!
 
Seems like a *very* well-written tutorial. (Each step could be heavily commented-documented to alleviate "why/whatfor" concerns of those very unfamiliar with even unix command lines; but that would be a lot of writing...). However, what might be most helpful initially is simply add the size(s) of the disk(s) and ram you are working with so others with different setups could investigate changes to the commands to accomodate the differences. (Maybe even also the processor type, if tuning is necessary, etc.) OTOH if freebsd usage was thousands of times greater, all you'd need is a "post your complete setup and results and command lines" request at the end, and other similar guides may follow in the same thread...
 
@jb_fvwm2,

thanks for your kind words. Disk size and memory is not relevant that's why I haven't mentioned anything. This guide installs freebsd-zfs to the remaining size of a disk if you subtract 4GBs for swap.

George
 
Hi, great post. A question: what benefits do we get from using zfs on root partition as opposed to good old UFS? I am setting up a small home file server with 3x500gb disks in raid5 and I'm not sure if I should just go for all zfs.
Thanks.
 
bbzz said:
Hi, great post. A question: what benefits do we get from using zfs on root partition as opposed to good old UFS? I am setting up a small home file server with 3x500gb disks in raid5 and I'm not sure if I should just go for all zfs.
Thanks.
ZFS has a lot of nice features like self healing, snapshots, compression and much more. It is also ideal for large raid set ups on file servers. However, in a file server case, it is always better if you separate the data from the OS.

I would suggest that you get a 4th drive of much less capacity, set up the OS there, be ZFS or UFS2 + Journaling and use the other 3 for a separate Raidz1 pool.

If you can't do that then you have the option of creating a ZFS on root Raidz1 pool which has better performance and integrity than any other software or (some) hardware raid. Just make sure you have at least 4Gb or RAM.
 
@gkontos Thanks.
That makes very good sense and I was thinking of that. Since this is also my desktop machine as well as server, and would prefer to run my system on a disk rather than USB stick (and buying a new disk isn't an option), maybe creating two separate pools, one for data, other for base system would be best solution (since physical separation is not possible) ?
It is much slower at boot than UFS was, not sure if that matters or is just coincidence (not that it will be rebooted that often). I have 6gb, didn't do any zfs tuning.
 
bbzz said:
@gkontos Thanks.
That makes very good sense and I was thinking of that. Since this is also my desktop machine as well as server, and would prefer to run my system on a disk rather than USB stick (and buying a new disk isn't an option), maybe creating two separate pools, one for data, other for base system would be best solution (since physical separation is not possible) ?
With ZFS its best if you dedicate the whole disks rather than create different partitions. Of course, after you install your system you can separate your data from OS with something like:

[CMD=""]zfs create -o mountpoint=/export/data zroot/data[/CMD]
bbzz said:
It is much slower at boot than UFS was, not sure if that matters or is just coincidence (not that it will be rebooted that often). I have 6gb, didn't do any zfs tuning.
Booting shouldn't be slower than UFS and tuning is irrelevant from booting speed. How exactly did you create the system ?
 
But, there has to be at least 64kb partition for zfs bootstrap.
I installed raidz1 on 3 500gb disks, ad10s2, ad12s2, and ad14s2.
ad10s1, ad12s1, and ad14s1 are only 64kb big for zfs to boot from.
So unless I use another disk for booting, I can't really use whole disks for zfs (whole as in, ad10, ad12, ad14).
Is that ok?
I know it would be better to get another separate disk for system, but like I said, can't afford a new drive right now.
 
bbzz said:
But, there has to be at least 64kb partition for zfs bootstrap.
I installed raidz1 on 3 500gb disks, ad10s2, ad12s2, and ad14s2.
ad10s1, ad12s1, and ad14s1 are only 64kb big for zfs to boot from.
So unless I use another disk for booting, I can't really use whole disks for zfs (whole as in, ad10, ad12, ad14).
Is that ok?
I know it would be better to get another separate disk for system, but like I said, can't afford a new drive right now.
That is correct. ads1 is your freebsd-boot partition. By using the whole disk for ZFS I meant that you shouldn't create different freebsd-zfs partitions. So yes, in theory you aren't using the full disk for ZFS. That can only be done on a disks that don't need to be bootable.
 
I followed your instructions (except that I'm creating a 3 disk RAIDz1 pool) and I get the error message
Code:
cannot import 'zroot': one or more devices is currently unavailable
and # zpool import says that the pool is corrupted. Before exporting the pool everything is fine. Can anybody tell me what's happening here?

Thanks.

Thomas

PS: I'm trying to install FreeBSD 9.0-BETA2.
 
@volatilevoid,

The guide has been updated but unfortunately not in this thread. I will do it here also once I find the time.
Have a look at the instructions and please let me know if you still have issues and at which stage.

George
 
Hello George,

thanks for the updated guide. Still, I'm experiencing exactly the same problem as before.

# zdb -l /dev/gpt/disk0 looks promising though.
I don't know what the problem could be... maybe someone else already tried to install FreeBSD 9.0-BETA2 on ZFS? I had absolutely no problem with 8.x (with the live FS).

Thomas
 
It's working now! :)

And here comes how I did it:

Create R/W /tmp:
# umount /dev/md1; mdmfs -s 512M md1 /tmp

Copy everything from /boot to /tmp.

Create R/W /boot:
# mdmfs -s 512M md2 /boot

Copy everything from /tmp back to /boot. Now we made /boot writeable (like in your original guide).

Set up GPT on disks.

When creating the zpool, do the following:
# zpool create -m /tmp/zroot zroot /dev/gpt/disk0

The newly created zpool will be mounted under our shiny new, writeable /tmp and we don't need to export/import the zpool.

Proceed like normal, copy zpool.cache from /boot/zfs/zpool.cache instead from /tmp/zpool.cache.

Hope that helps anyone.

Thomas
 
volatilevoid said:
It's working now! :)


When creating the zpool, do the following:
# zpool create zroot -m /tmp/zroot /dev/gpt/disk0

The newly created zpool will be mounted under our shiny new, writeable /tmp and we don't need to export/import the zpool.

Thomas

Indeed using the -m switch saves the trouble of exporting / importing the pool.
So, did you manage to perform a raidz1 installation this way ?
 
gkontos said:
Indeed using the -m switch saves the trouble of exporting / importing the pool.
So, did you manage to perform a raidz1 installation this way ?

Yes, it worked like a charm. :)

ian-nai said:
Umm...don't you need to tell zfs to use 4k bytes instead of 512 (for advanced format drives)? http://forums.freebsd.org/showthread.php?t=21644

Or is this somehow magically taken care of in 9?

I didn't know about the possibility to change the sector size. My zpool has ashift=9.
 
Sorry for the double post.

I reinstalled the pool with ashift=12 (using the gnop trick) and it seems that the system can't boot after one single successful boot. I'm stuck in boot0 and see the spinning bar forever.

I'm not totally sure if there is a connection between ashift=12 and the boot problem but still maybe someone has an idea. From the live CD I'm able to import the zpool (strangely only if I create disk0.nop, disk1.nop and disk2.nop (although I only created disk0.nop when installing!), otherwise # zpool import only shows numbers instead of devices.)

My problem seems to be related to this one. Any thoughts?

Bests
Thomas

Edit: I had a look at the Intel X25-M manual and the only reference to sector size was "1 sector = 512 bytes" - still I guess the physical sector size is 4 K. Can anybody confirm this?

Edit 2: The system runs without problems if ashift is 9. So may this be a bug in in the bootloader when booting from a RAIDz pool?
 
Looks like I led you down the rabbit hole of confusion with me...sorry! (Really...really...sorry.)

What I've learned, so far, is there doesn't seem to be a good way to set ashift = 12 under FreeBSD atm. Judging by when the code freeze was, I think it's possible to create a pool with ashift=12 (or other values) under the 9-beta.

http://www.freebsd.org/cgi/query-pr.cgi?pr=153695

But, I could be completely misinterpreting even that. (I probably am.) I was having other problems and gave up on the ashift=12 business yesterday.

My conclusion so far has been:
1.) It's probably best to do your booting off of UFS drives (you can tell sysinstall to create UFS partitions with the right sector size!)
2.) ashift = 9 might be acceptable on 'advanced format drives' that have built-in compatibility. (After some tinkering, this seems less and less acceptable...)
3.) Always be mindful of where your partitions start and end.

I'd suggest #1 or checking out the beta...
 
ian-nai said:
What I've learned, so far, is there doesn't seem to be a good way to set ashift = 12 under FreeBSD atm. Judging by when the code freeze was, I think it's possible to create a pool with ashift=12 (or other values) under the 9-beta.

http://www.freebsd.org/cgi/query-pr.cgi?pr=153695

The way I'm now aware of is to create a transparent provider with gnop. Creating the pool is definitely possible, also with 9.0-BETA2. I think the problem may be the bootloader rather than the pool itself (which can be mounted from the live CD/DVD just fine.)

ian-nai said:
My conclusion so far has been:
1.) It's probably best to do your booting off of UFS drives (you can tell sysinstall to create UFS partitions with the right sector size!)
2.) ashift = 9 might be acceptable on 'advanced format drives' that have built-in compatibility. (After some tinkering, this seems less and less acceptable...)
3.) Always be mindful of where your partitions start and end.

I'd suggest #1 or checking out the beta...

1.) Well, I like having root on ZFS, so I think I'll stick to it.
2.) I had root on ZFS for some time now (with ashift=9) and the performance was fine with me although I have little experience with ashift=12 (I only was able to extract the ports tree before the system didn't want to boot anymore and didn't see big differences between the two).
3.) On my SSDs, I'm aligning to full megabytes. ;)

Thomas
 
volatilevoid said:
The way I'm now aware of is to create a transparent provider with gnop. Creating the pool is definitely possible, also with 9.0-BETA2. I think the problem may be the bootloader rather than the pool itself (which can be mounted from the live CD/DVD just fine.)

gnop works. Another option is to use geli, that is if you need/want encryption. In this case you don't need gnop since you already define block size to be encrypted. For example:

Code:
geli init -b -K /boot/keys/key.key [B]-s 4096[/B] -l 256 /dev/label/zroot

Size of 4096 (2^12) works for drives that ashift with 12 but also 9 (512 block size).
 
Originally Posted by gkontos.
Code:
...
gpart bootcode -b boot/pmbr -p boot/gptzfsboot -i 1 ada0
I think it must be so:
gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

Originally Posted by gkontos.
Done! Now, all that is left is to create the rc.conf, loader.conf and fstab.
Code:
...
echo 'zfs_load="YES"' > /mnt/boot/loader.conf
echo 'vfs.root.mountfrom="zfs:zroot"' > /mnt/boot/loader.conf
...

You create /mnt/boot/loader.conf and write it [font="Verdana"]zfs_load="YES"[/font]. After you are REMOVE /mnt/boot/loader.conf and create it again and write it [font="Verdana"]vfs.root.mountfrom="zfs:zroot"[/font].

I think you would write:
Code:
...
echo 'zfs_load="YES"' > /mnt/boot/loader.conf
echo 'vfs.root.mountfrom="zfs:zroot"' >> /mnt/boot/loader.conf
...

Fix this, please. Thank you for your article.
 
Thank you for the post, I used this on my FreeBSD 9 testing machine (hosting my django development server).

Some little tweaks
cp -R /boot/* /tmp
and
cp -R /tmp/* /boot

After installing and everything looked ok. I tried to add a nullfs mount in fstab:
Code:
/usr/www/something /home/user/something nullfs rw 0 0
Note that the directory something in /home/user exists. This line caused boot failure of the machine needless to say /usr looked not mounted, I entered /bin/sh as shell. Has someone encountered this error or I am forgetting something?
 
The guide has been updated to reflect the changes that have appeared at my blog.

This is something that should have been done long time ago. I know and I apologize for this!

Notes:

When creating the pool we could use this method to avoid the warnings regarding read only.

[CMD=""]# zpool create -m /tmp/zroot zroot /dev/gpt/disk0[/CMD]

This however doesn't work all the times. I have done over 100 installations from May and at some point I remember this failing me.

This guide assumes that you are using a 64bit platform like you should. Also a minimum of 4GB Ram is my recommendation.
 
Maybe you could add an additional part where you explain how to use geli to encrypt whole disk, and boot from small partition form same disk, or another thumb drive. That would make it complete :) Anyway, just a thought.
 
Back
Top