Going all ZFS - the first start

dvl@

Developer
Last night I started work on an all-ZFS system. I made progress, but couldn't get the system to boot. I've not had time to document anything. Short story: Mounting from zfs:root failed with error 2.

I was using a combination of two approaches. I was booting from a thumb-drive and using these two guides: https://www.dan.me.uk/blog/2012/01/22/booting-from-zfs-raid0156-in-freebsd-9-0-release/ using the layout described in http://blogs.freebsdish.org/pjd/2010/08/06/from-sysinstall-to-zfs-only-configuration/.

See the screenshots on Google+.

I hope to get this going tonight.
 
I didn't spot the ZFS pool in the list of GEOM managed devices. Did you copy the zpool.cache file to /poolmountpoint/boot/zfs at the end of the installation?'

If you were using a recent 9-STABLE you wouldn't have to muck with zpool.cache at all, just for your information :)
 
A good point. No, I did not. There was no such file. The original export failed. I went back to try it again, failed. I'm going to start smaller soon....
 
This script worked well for me prior to the changes in 9-STABLE. Maybe something in there is helpful. Of course you would have to customize DISKS, vdevs et cetera.

Code:
# Based on http://www.aisecure.net/2012/01/16/rootzfs/ and 
# @vermaden's guide on the forums

DISKS="ada0 ada1"

for I in ${DISKS}; do
	NUM=$( echo ${I} | tr -c -d '0-9' )
	gpart destroy -F ${I}
	gpart create -s gpt ${I}
	gpart add -b 34 -s 94 -t freebsd-boot -l bootcode${NUM} ${I}
	gpart add -t freebsd-zfs -l disk${NUM} ${I}
	gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ${I}
	gnop create -S 4096 /dev/gpt/disk${NUM}
done

zpool create -f -o altroot=/mnt -o cachefile=/tmp/zpool.cache zroot mirror /dev/gpt/disk*.nop
zpool export zroot

for I in ${DISKS}; do
	NUM=$( echo ${I} | tr -c -d '0-9' )
	gnop destroy /dev/gpt/disk${NUM}.nop
done

zpool import -o altroot=/mnt -o cachefile=/tmp/zpool.cache zroot

zpool set bootfs=zroot zroot
zfs set atime=off sys
zfs set checksum=fletcher4 zroot

zfs create zroot/usr
zfs create zroot/usr/home
zfs create zroot/var
zfs create zroot/tmp

chmod 1777 /mnt/tmp
cd /mnt ; ln -s usr/home home
chmod 1777 /mnt/var/tmp

cd /usr/freebsd-dist
export DESTDIR=/mnt
for file in base.txz kernel.txz doc.txz;
do (cat $file | tar --unlink -xpJf - -C ${DESTDIR:-/}); done

cp /tmp/zpool.cache /mnt/boot/zfs/zpool.cache

cat << EOF >> /mnt/boot/loader.conf
zfs_load=YES
vfs.root.mountfrom="zfs:zroot"
EOF

cat << EOF >> /mnt/etc/rc.conf
defaultrouter="192.168.0.200"
hostname="storage2"
ifconfig_em0="inet 192.168.0.101  netmask 255.255.255.0"
keymap="us.iso"
mountd_flags="-r" # for nfsd
nfs_client_enable="YES"
nfs_server_enable="YES"
rpcbind_enable="YES"
sendmail_enable="NO"
sendmail_msp_queue_enable="NO"
sendmail_outbound_enable="NO"
sendmail_submit_enable="NO"
sshd_enable="YES"
zfs_enable=YES
EOF
 
She's fast.

# time /etc/periodic/weekly/310.locate
Code:
Rebuilding locate database:

real	0m2.181s
user	0m0.355s
sys	0m1.926s

And, for what it's worth, portsnap extract took 27 seconds.
 
Here is what I used:

Code:
# Based on [url]http://www.aisecure.net/2012/01/16/rootzfs/[/url] and 
# @vermaden's guide on the forums

DISKS="ada0 ada1 ada2 ada3 ada4 ada5"

gmirror load
gmirror stop swap

for I in ${DISKS}; do
        NUM=$( echo ${I} | tr -c -d '0-9' )
        gpart destroy -F ${I}
        gpart create -s gpt ${I}
        gpart add -b 34 -s 94 -t freebsd-boot -l bootcode${NUM} ${I}

        gpart add -s 2g -t freebsd-swap -l swap${I} ${I}

        #
        # note: not using all the disk, on purpose, adjust this size for your HDD
        #
        gpart add -t freebsd-zfs -s 2790G -l disk${NUM} ${I}
        gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ${I}
        gnop create -S 4096 /dev/gpt/disk${NUM}
done

gmirror label -F -h -b round-robin swap /dev/gpt/swap*

#zpool create -f -o altroot=/mnt    -o cachefile=/tmp/zpool.cache -O atime=off -O setuid=off -O canmount=off system raidz2 /dev/gpt/disk*.nop
zpool create -f -O mountpoint=/mnt -o cachefile=/tmp/zpool.cache -O atime=off -O setuid=off -O canmount=off system raidz2 /dev/gpt/disk*.nop
zpool export system

for I in ${DISKS}; do
        NUM=$( echo ${I} | tr -c -d '0-9' )
        gnop destroy /dev/gpt/disk${NUM}.nop
done

zpool import -o altroot=/mnt -o cachefile=/tmp/zpool.cache system

zfs create -o mountpoint=legacy -o setuid=on system/rootfs

zpool set bootfs=system/rootfs system

# there is no sys

#zfs set atime=off sys
zfs set checksum=fletcher4 system

mount -t zfs system/rootfs /mnt

zfs create system/root
zfs create -o canmount=off  system/usr
zfs create -o canmount=off  system/usr/home
zfs create -o setuid=on     system/usr/local
zfs create -o compress=gzip system/usr/src
zfs create -o compress=lzjb system/usr/obj
zfs create -o compress=gzip system/usr/ports
zfs create -o compress=off  system/usr/ports/distfiles
zfs create -o canmount=off  system/var
zfs create -o compress=gzip system/var/log
zfs create -o compress=lzjb system/var/audit
zfs create -o compress=lzjb system/var/tmp
#
# I was getting failure on these chmod so I did them after the system booted
#
#chmod 1777 /mnt/var/tmp
zfs create -o compress=lzjb system/tmp
#chmod 1777 /mnt/tmp
#chmod 1777 /mnt/var/tmp
zfs create system/usr/home/dan

cd /mnt ; ln -s usr/home home

cd /usr/freebsd-dist
export DESTDIR=/mnt
for file in base.txz kernel.txz doc.txz;
do (cat $file | tar --unlink -xpJf - -C ${DESTDIR:-/}); done

cp /tmp/zpool.cache /mnt/boot/zfs/zpool.cache

cat << EOF >> /mnt/etc/fstab
system/rootfs        /    zfs  rw,noatime 0 0
/dev/mirror/swap.eli none swap sw         0 0
EOF

cat << EOF >> /mnt/boot/loader.conf

geom_eli_load="YES"
geom_label_load="YES"
geom_mirror_load="YES"
geom_part_gpt_load="YES"

zfs_load=YES
vfs.root.mountfrom="zfs:system/rootfs"
EOF

cat << EOF >> /mnt/etc/rc.conf
defaultrouter="10.5.0.1"
hostname="slocum.unixathome.org"
ifconfig_em0="inet 10.5.0.207  netmask 255.255.255.0"
keymap="us.iso"
sendmail_enable="NO"
sendmail_msp_queue_enable="NO"
sendmail_outbound_enable="NO"
sendmail_submit_enable="NO"
sshd_enable="YES"
zfs_enable=YES
EOF

cat << EOF >> /mnt/etc/resolv.conf
search unixathome.org
nameserver 10.5.0.1
nameserver 10.5.0.2
EOF

echo WRKDIRPREFIX=/usr/obj >> /mnt/etc/make.conf

zfs umount -a
umount /mnt
zfs set mountpoint=/ system
 
For background:

I booted using a USB stick, then dropped to a shell. Started dhclient, then used scp to get the above file from another system.

I ran the script, then rebooted. Done.
 
Hi @dvl@!

After looking a little closer at those Seagates, I´ve confirmed that at least they are 4k disks. Might be the case for the other big drives as well, but I haven´t checked.
Seagate HDD datasheet

And correct me if I´m wrong, but I can´t see you specifying correct partition alignment for them. Repartitioning can be done online; just offline, repartition and resilver one disk at a time. It might not be a big deal for one standalone drive, but it might give better performance out of your zpool as a whole.

/Sebulon
 
Last edited by a moderator:
Sebulon said:
Hi @dvl@!

And correct me if I´m wrong, but I can´t see you specifying correct partition alignment for them.


I believe the purpose of the gnop commands are to achieve 4K alignment.
 
Last edited by a moderator:
gnop(8) is used to force the use of 4K blocks. In ZFS, this shows as ashift=12. That differs from alignment. Just because the filesystem is using 4K blocks does not mean they are evenly aligned with the 4K blocks native to the drive.

So the two different things that should be done for performance:

1. Start the partition on a block that is an even 4K multiple. If the whole drive contains only the filesystem, this will be zero. Otherwise, I recommend 2048 (1M) or a multiple of 1M or 1G.

2. Set the filesystem to use blocks that are 4K in size.
 
Just change your "gpart add" line to the following:
Code:
        gpart add -t freebsd-zfs -s 2790G [b]-b 2048[/b] -l disk${NUM} ${I}

That will start the partition at the 1 MB boundary, thus aligning it to 4K blocks, and providing the best performance.

That also leaves you 1 MB of free space at the beginning of the drive, in case you ever need to make it bootable. :) Very handy! I just converted my home ZFS server from using a separate USB stick to boot and 2x mirror vdevs for storage to an all-ZFS (root-on-zfs) setup, without losing data or using extra disks .... because I had that extra 1 MB of free space at the beginning of the disks. :)
 
wblock@ said:
gnop(8) is used to force the use of 4K blocks. In ZFS, this shows as ashift=12. That differs from alignment. Just because the filesystem is using 4K blocks does not mean they are evenly aligned with the 4K blocks native to the drive.

So the two different things that should be done for performance:

1. Start the partition on a block that is an even 4K multiple. If the whole drive contains only the filesystem, this will be zero. Otherwise, I recommend 2048 (1M) or a multiple of 1M or 1G.

2. Set the filesystem to use blocks that are 4K in size.

We are in partial luck:

Code:
[dan@slocum:~] $ zdb | grep ashift
            ashift: 12
 
phoenix said:
Just change your "gpart add" line to the following:
Code:
        gpart add -t freebsd-zfs -s 2790G [b]-b 2048[/b] -l disk${NUM} ${I}

That will start the partition at the 1 MB boundary, thus aligning it to 4K blocks, and providing the best performance.

That also leaves you 1 MB of free space at the beginning of the drive, in case you ever need to make it bootable. :) Very handy! I just converted my home ZFS server from using a separate USB stick to boot and 2x mirror vdevs for storage to an all-ZFS (root-on-zfs) setup, without losing data or using extra disks .... because I had that extra 1 MB of free space at the beginning of the disks. :)

Here is what I have now:

Code:
$ gpart show ada0
=>        34  5860533101  ada0  GPT  (2.7T)
          34          94     1  freebsd-boot  (47k)
         128     4194304     2  freebsd-swap  (2.0G)
     4194432  5851054080     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

Let's see if I can do this math.

Partition 3 (the one used for zfs) starts at block 4194432. If you divide by 4 (the number of 512 byte blocks in 4KB), you'll get the number of 4K blocks: 1048608

Am I misunderstanding this?
 
dvl@ said:
Here is what I have now:

Code:
$ gpart show ada0
=>        34  5860533101  ada0  GPT  (2.7T)
          [color="Red"]34[/color]          94     1  freebsd-boot  (47k)
         128     4194304     2  freebsd-swap  (2.0G)
     4194432  5851054080     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

(If the boot partition starts at an aligned value--normally block 40--the rest of the partitions will line up also. As long as they are even multiples of 1 MB or 1 GB in size. gpart(8)'s -a works as expected after 9.1-RELEASE, too.)

Let's see if I can do this math.

Partition 3 (the one used for zfs) starts at block 4194432. If you divide by 4 (the number of 512 byte blocks in 4KB), you'll get the number of 4K blocks: 1048608

Am I misunderstanding this?

Well, one part: 4096/512 = 8, not 4. But it is aligned: 4194432/8 = 524304, an integer.
 
Back
Top