ZFS Tune ZFS to take in account the speed difference between 2 disk : Sata SSD / M2 interface storage

srey · Jan 29, 2022

Hi !

I'm totally new to BSD (comming from Ubuntu/Debian land), starting an Homelab project with recent refurbished hardware :
Optiplex 3080, 8gb ram, core I5 10500T (6 cores / HyperThread Yes).

I'm planning something simple with this homelab :
- ZFS crypted / Yubikey
- Containers ( or a simple single node K3s )
- Ansible
- Easy ZFS rollback, including boot

After some discussion on reddit, one french BSD user (eolin) convainced me to give a try to open/free/BSD with Bastille + Ansible (see this post )
Why not, this is a good way to learn

Considering FreeBsd + ZFS, my question is about Storage topology with ZFS.

My hardware is limited, and i already post something on reddit ( see this post ), i have only two slots on my hardware for storage :
- 1 pci gen 3 4x with NVME P5 Crucial 1Tb => theorically limited at 3400mb/s
- 1 sata Crucial 1Tb => theorically limited at 600 mb/s

I prefer integrity (mirroring the two disk) than speed, so eolin indicate me a way to do that, including the system / boot on the main pool :
https://forums.freebsd.org/threads/howto-convert-single-disk-zfs-on-root-to-mirror.49702/

But something interesting, another reddit user say there is some possibility to tune ZFS to take in account the difference of speed between the two mirrored storage : https://blog.mh3000.net/viewer/1i1n1D6xc29

Do you know if something is possible with freeBSD, that could limit the difference of speed between the two interface (sata ssd vs nvme) ?

About rollback i already found some discussions on this forum :

ZFS - Confused about what is backedup by a zfs snapshot

I execute zfs list -r, I get (18:18)ROOT@anthem:/root# zfs list -r NAME USED AVAIL REFER MOUNTPOINT zroot 22.1G 424G 88K /zroot zroot/ROOT 4.01G 424G 88K none zroot/ROOT/default 4.01G 424G 4.01G / zroot/tmp 180K 424G 180K /tmp zroot/usr 18.1G 424G 88K /usr zroot/usr/home 136K 424G 136K...

forums.freebsd.org

eoli3n · Jan 30, 2022

The solution can be even simpler : the freebsd installer take care of the mirror natively.
The post I gave you is more to understand how it's working.
My advice is to install on a single disk, and then mirror it after.
The benefit is that you then get the complete procedure to follow if a disk crash, to re mirror the new one.
The goal is to be able to boot Nvme efi entry or the sata one independently.
Here my notes about this

 # List disks

$ camcontrol devlist

$ geom disk list



# Partition table on the new disk

$ gpart destroy -F ada1

$ gpart create -s GPT ada1

$ gpart add -t efi -l efi2 -s 260M ada1

$ gpart add -t freebsd-boot -l boot2 -s 512K ada1

$ gpart add -t freebsd-swap -l swap2 -s 2G ada1

$ gpart add -t freebsd-zfs -l zfs2 ada1

$ gpart bootcode -p /boot/boot1.efi -i 1 ada1



# Create the mirrored encrypted swap

$ ls /dev/gpt

boot2        efi2        gptboot0    swap0        swap2        zfs2

$ gmirror create swap gpt/swap0 gpt/swap2

$ gmirror status

       Name    Status  Components

mirror/swap  COMPLETE  gpt/swap0 (ACTIVE)

                       gpt/swap2 (ACTIVE)

$ echo "/dev/mirror/swap.eli none swap sw,ealgo=AES-XTS,keylen=128,sectorsize=4096 0 0" >> /etc/fstab

$ swapon -a

swapon: adding /dev/mirror/swap.eli as swap device

$ swapinfo

Device          1K-blocks     Used    Avail Capacity

/dev/mirror/swap.eli   2097152        0  2097152     0%



# ZFS mirror zfs partitions

$ zpool attach zroot ada0p4 ada1p4

$ zpool status



# Manually sync EFI partition

$ newfs_msdos -F 32 -c 1 /dev/ada1p1

$ mount -t msdosfs /dev/ada1p1 /mnt

$ rsync -aAXv /boot/efi /mnt

$ efibootmgr --create -a -d /dev/ada1p1 -L "FreeBSD 2" -l "/mnt/efi/efi/freebsd/loader.efi" --verbose

Then try, to unplug one disk and boot the other one.

ralphbsz · Jan 30, 2022

(Editorial comment: The link you have to a blog on mh3000.net goes to a discussion on this forum.)

As a general rule, if you mirror two devices (as a RAID-1 pair), the speed of the mirror will be:

For reading: Between twice the speed of the slower, and the sum of the speeds of the two
For writing: Close to the speed of the slower one, or a little slower.

Why? Let's start with the writing: each write needs to be done to both disks, but it can be done in parallel. So the faster disk will typically wait for the slower one. If both disks are about the same speed, and their latency fluctuates, the wait will always be a little slower than the mean.

For reading, most of the time (if both disks are available), only one needs to be read. The simplest way to implement this is to alternate between the two disks, because the mirroring layer doesn't know ahead of time which disk will be faster. That leads to the total speed being about twice the speed of the slower. On the other hand, if the mirroring layer can know ahead of time which disk is faster, it can send all IOs to the faster one, unless the faster one is already busy doing an IO, in which case the slower one gets to do one IO. Ideally, the mirroring layer can keep queues of IOs on both disks, and balance the queues for earliest expected completion time of IOs, which leads to the sum of the speeds.

So in your case, with very unbalanced performance, the write performance of ZFS will probably not be helped at all by the fast device, while the read performance MIGHT be helped significantly.

So, is there any way to utilize your fast device better? Yes there is: Put some extra workload onto it. And this is where it gets difficult. In your simple setup, you'll probably have all your ZFS file system be mirrored. But if you have another file system that doesn't need the reliability of mirroring (for example something for temporary files), you could put it on the fast NVME disk; it would slow that disk down some, but that wouldn't make a big difference to the main performance. Another perhaps crazy option is to use extra space on the fast NVME disk for ZFS ZIL or L2ARC for your main ZFS file system. However, this opens several cans of worms:

The ZIL and L2ARC would not be non-redundant, meaning have lower reliability. What happens when a ZIL or L2ARC fails? Does that make the underlying file system fail too, or do they continue operating, just without the speed advantage of these caches? I don't know (but suspect that failure of ZIL might be bad).
Given that IOs to the ZIL and L2ARC are correlated with IOs to the main file system, the performance improvement could be weird.

Worth looking into.

srey · Feb 17, 2022

eoli3n said:
The solution can be even simpler : the freebsd installer take care of the mirror natively.
The post I gave you is more to understand how it's working.
My advice is to install on a single disk, and then mirror it after.
The benefit is that you then get the complete procedure to follow if a disk crash, to re mirror the new one.
The goal is to be able to boot Nvme efi entry or the sata one independently.
Here my notes about this

# List disks $ camcontrol devlist $ geom disk list # Partition table on the new disk $ gpart destroy -F ada1 $ gpart create -s GPT ada1 $ gpart add -t efi -l efi2 -s 260M ada1 $ gpart add -t freebsd-boot -l boot2 -s 512K ada1 $ gpart add -t freebsd-swap -l swap2 -s 2G ada1 $ gpart add -t freebsd-zfs -l zfs2 ada1 $ gpart bootcode -p /boot/boot1.efi -i 1 ada1 # Create the mirrored encrypted swap $ ls /dev/gpt boot2 efi2 gptboot0 swap0 swap2 zfs2 $ gmirror create swap gpt/swap0 gpt/swap2 $ gmirror status Name Status Components mirror/swap COMPLETE gpt/swap0 (ACTIVE) gpt/swap2 (ACTIVE) $ echo "/dev/mirror/swap.eli none swap sw,ealgo=AES-XTS,keylen=128,sectorsize=4096 0 0" >> /etc/fstab $ swapon -a swapon: adding /dev/mirror/swap.eli as swap device $ swapinfo Device 1K-blocks Used Avail Capacity /dev/mirror/swap.eli 2097152 0 2097152 0% # ZFS mirror zfs partitions $ zpool attach zroot ada0p4 ada1p4 $ zpool status # Manually sync EFI partition $ newfs_msdos -F 32 -c 1 /dev/ada1p1 $ mount -t msdosfs /dev/ada1p1 /mnt $ rsync -aAXv /boot/efi /mnt $ efibootmgr --create -a -d /dev/ada1p1 -L "FreeBSD 2" -l "/mnt/efi/efi/freebsd/loader.efi" --verbose

Then try, to unplug one disk and boot the other one.

Thanks for that, i have some time to test a first install today.

I first run the default install without any default encryption (swap and disk), then trying to mirror with your instruction.

Actually running ls on /dev/gpt list gptboot0 but not swap0, and when i run geom disk list the swap partition exist with correct labelling swap0

All other commands return that swap exist :
- swapinfo -h return swap exist at nvd0p3
- top say swap exist

Any idea ?

Edit :

Running glabel, swap0 is not listed. Is it a bug from installer to not label correctly the partition ?
I found that: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247367

Running gpart list swap0 is correctly listed for nvd0p3

Solution :

I don't know if it is normal behavior, replacing dev partition by label into fstab do the job :
/dev/nvd0p1 by /dev/gpt/efiboot0
and /dev/nvd0p3 by /dev/gpt/swap0

Now /dev/gpt/swap0 appear in /dev/gpt after reboot.

SirDice · Feb 17, 2022

srey said:
Any idea ?

Depends, you might have some different settings:

Code:

dice@fbsd-test:~ % sysctl kern.geom.label
kern.geom.label.disk_ident.enable: 1
kern.geom.label.gptid.enable: 1
kern.geom.label.gpt.enable: 1
kern.geom.label.ufs.enable: 1
kern.geom.label.ufsid.enable: 1
kern.geom.label.reiserfs.enable: 1
kern.geom.label.ntfs.enable: 1
kern.geom.label.msdosfs.enable: 1
kern.geom.label.iso9660.enable: 1
kern.geom.label.flashmap.enable: 1
kern.geom.label.ext2fs.enable: 1
kern.geom.label.debug: 0

srey · Feb 17, 2022

SirDice said:

Depends, you might have some different settings:

Code:

dice@fbsd-test:~ % sysctl kern.geom.label
kern.geom.label.disk_ident.enable: 1
kern.geom.label.gptid.enable: 1
kern.geom.label.gpt.enable: 1
kern.geom.label.ufs.enable: 1
kern.geom.label.ufsid.enable: 1
kern.geom.label.reiserfs.enable: 1
kern.geom.label.ntfs.enable: 1
kern.geom.label.msdosfs.enable: 1
kern.geom.label.iso9660.enable: 1
kern.geom.label.flashmap.enable: 1
kern.geom.label.ext2fs.enable: 1
kern.geom.label.debug: 0

By default on clean install i have two lines different :


kern.geom.label.disk_ident.enable: 0
kern.geom.label.gptid.enable: 0

I found these two value was cleary overwritten to 0 in bootloader.conf ... why, i don't know ...

Following the tutorial of @e3olin, the only thing not indicated was the need to run this command before mirroring swap :
- swapoff /dev/gpt/swap0
- gmirror load

That jump me into two question about /etc/fstab :
- starting from a not-encrypted swap, finally i have an encrypted swap created with gmirror, why ?
- i have two line about swap in fstab, i need to remove the first one /dev/gpt/swap0 ?

I run into trouble when i try to desactivate the first SSD nvd0, starting only with ada0 to try if mirroring work well.
/dev/gpt/efiboot0 and /dev/gpt/swap0 are not found so booting failed.

I understand from that that i need to delete the /dev/gpt/swap0 ? from /etc/fstab, but ... how freebsd know that /dev/gpt/efiboot0 is not available and run into /dev/gpt/efi2 alternative ? Answer: It seems i could comment the /dev/gpt/efiboot0 line added by installer, that not needed in fstab.

I also found that i probably need to add this in bootloader.conf :

kldload geom_mirror
echo 'geom_mirror_load="YES"' >> /boot/loader.conf

Ok, now that works :beer:

Next step, ZFS native encrypted /home/* and Ansible.

ZFS Tune ZFS to take in account the speed difference between 2 disk : Sata SSD / M2 interface storage

srey

ZFS - Confused about what is backedup by a zfs snapshot

eoli3n

ralphbsz

srey

SirDice

Administrator

srey