ZFS zfs optimizations

decuser · Jan 12, 2023

OK. I have my whizbang new RAID-10 setup on zfs working (stripe over mirrors) and all, but I'm curious what y'all think are the most important post setup optimizations you think are:

compression? what exactly is it? When I first heard about it, I thought it just meant compressed files on the filesystem and a lot of overhead, is that it, or is it something you can't live without? Why?

dedup? sounds like a good idea - sort of like git does - with shared blobs that are linked. But, is it a necessity in your opinion? Why?

replication? this one seems tricky to me, but is it? Or is it the greatest thing since sliced bread and everyone should do it?

RAM set asides for stuff - do I need to just leave all my RAM alone to manage itself, or carve out for ZFS?

My use case continues to be mostly serving up files and hosting my own fossil repo, as I've said elsewhere, so not lots of connections, but I'd like things to be fast to read and reasonably fast to write.

I guess what I'm asking is how do you, personally and not as your enterprise sysadmin persona, set up zfs initially. For nigh on a decade I've created the pool and turned off compression and that's about it. It's been flawless perfection. But, now with my upgraded (if aged) hardware (i7 quad core, sas rig, scads of ssh drives and spinners too, and 24 GB memory), I'd like to explore some optimizations that others have found useful. What do you typically do when setting up your zfs pools?

Full disclosure... I'm trying out TrueNAS Core at the moment where I'm sharing the stripe over mirrors pool on the lan, along with a jail running fossil. But, it's FreeBSD underneath and I'm much more familiar with FreeBSD, so whatever applies to it should translate pretty well.

Thanks,

Will

Alain De Vos · Jan 12, 2023

Default lz4 compression is a good choice. Do not put compression off, as cpu-time is cheap.
You can set atime to off.
Dedup takes alot of memory & cpu , the advantages are not clear except in special cases
For a database you can put recordsize=16K instead of 128K.
Take snapshots & send them to another drive for backup/recovery.

A mirror is faster compared to zraid.

cracauer@ · Jan 12, 2023

My most important tuning is to turn on compression on the right datasets, and create datasets fine-graded enough to separate out non-compressible things (media files, distfiles etc.).

rootbert · Jan 12, 2023

my basic setup is: atime = off and compression = zstd-9 (which is slower than lz4 but has a higher compression rate; so for speed I would go lz4, for more aggressive compression I would go zstd-X, the other compression algorithms are basically just for compatibility)

BaronBS · Jan 12, 2023

cracauer@ said:
My most important tuning is to turn on compression on the right datasets, and create datasets fine-graded enough to separate out non-compressible things (media files, distfiles etc.).

Could you elaborate on that? In a workstations/home pc, what datasets would you compress and why?

rootbert said:
my basic setup is: atime = off and compression = zstd-9 (which is slower than lz4 but has a higher compression rate; so for speed I would go lz4, for more aggressive compression I would go zstd-X, the other compression algorithms are basically just for compatibility)

Why atime = off?

cracauer@ · Jan 12, 2023

BobSlacker said:
Could you elaborate on that? In a workstations/home pc, what datasets would you compress and why?

I compress everything that I don't know is incompressble. Incompressible is media stuff, already compressed files such as distfiles and encrypted stuff.

BaronBS · Jan 12, 2023

cracauer@ said:
Incompressible is media stuff,

So, set compression on in the whole /home dataset it is not a good approach?

rootbert · Jan 12, 2023

BobSlacker said:
Could you elaborate on that? In a workstations/home pc, what datasets would you compress and why?

Why atime = off?

though for your first question I am not directly addressed, however: I compress everything because from various benchmarks (database, mailservers, normal filestorage) I came to the conclusion that it not only brings the benefit of saving space, but also a speedup especially on slow rust (HDDs). A bottleneck is the transfer from disk to memory, so basically if you need to transfer less data it is faster, uncompressing data is usually very fast. So for already compressed data like multimedia files I use lz4 which is very fast in detecting whether the data can be compressed or not.

atime = off: this attribute saves the timestamp of a read-access on a file. I do not need that, I am fine with just knowing when a file was created or changed. If you enable atime and read a file, also a write operation happens to write that metadata, which of course is costly.

cracauer@ · Jan 12, 2023

BobSlacker said:
So, set compression on in the whole /home dataset it is not a good approach?

I have not measured what the damage is from having compression on and then having uncompressable files in there.

I could also imagine that there are systems where compressable, compressed reading and writing is actually slower. Think of a low clocked Xeon but with a bazillion of PCIe lanes all attached to very fast SSDs.

Eric A. Borisch · Jan 12, 2023

BobSlacker said:
So, set compression on in the whole /home dataset it is not a good approach?

Set it to compression=on (lz4 by default currently). There are very few situations where it may make sense to set to off. A server actively saving and serving multiple streams of already compressed video, for example, would save a few cycles by setting it to off.

Unless you've got a very demanding workload dealing with saving data that does not compress, leave it on.

mer · Jan 12, 2023

compression is "interesting". CPU cycles used to compress blocks to be written, but a smaller block uses less bandwidth to write/read. As said by others, some data does not compress so having it on for that data is wasted CPU, but you also need to look at the data set: have a separate dataset for things like mp4 or other uncompressible data you set the property to off because "on" gets you nothing.
But in general where a dataset like a user home, compression on eventually helps. Some data can't be compressed, some/most will, so over the long run the average is having compression on is good (CPU to compress/decompress is less than transfer time so smaller compressed block is faster).

Eric A. Borisch · Jan 12, 2023

Note ZFS will also not use the compressed blocks when writing a file unless it reduces usage by at least 1/8. So if you’re writing already-compressed data (most media formats), you will have extra overhead (but depending on drive performance, ~not human-noticeable with lz4) during the write, but not on later reads, as the blocks will be sorted un-compressed (by ZFS) on-disk.

In addition, be aware that very small (close to the pool’s block size) files have to be very compressible to actually be stored compressed.

The only situation where I have personally found benefit to disabling compression was in a MythTV setup, where it was streaming multiple pre-compressed high bitrate streams onto and off of the drive(s) simultaneously.

cy@ · Jan 12, 2023

Compression costs nanoseconds of CPU time while turning it off results in more blocks written. Writing to disk costs microseconds. I'd rather pay in nanoseconds than microseconds.

decuser · Jan 13, 2023

cy@ said:
Compression costs nanoseconds of CPU time while turning it off results in more blocks written. Writing to disk costs microseconds. I'd rather pay in nanoseconds than microseconds.

OK. So, if I zfs set compression=on mypool or zfs set compression=lz4 mypool, does it compress what's already on the drive or no?

Alain De Vos · Jan 13, 2023

It compresses the dataset and all it's children.
Note :
For /var/log you can set compression to gzip-9 as it is highly compressible text.
For /mnt/my_media_files you can set compression to off as e.g.mp4 or flac is already compressed.

Styrsven · Jan 13, 2023

decuser said:
OK. So, if I zfs set compression=on mypool or zfs set compression=lz4 mypool, does it compress what's already on the drive or no?

It only compresses what you write after you have activated compression, it doesn't update what is already written.
Here is a link to an article about zfs compression https://klarasystems.com/articles/openzfs1-understanding-transparent-compression/

cy@ · Jan 17, 2023

decuser said:
OK. So, if I zfs set compression=on mypool or zfs set compression=lz4 mypool, does it compress what's already on the drive or no?

No. It only compresses new writes. Whatever is already written stays as is.