ZFS zfs optimizations

OK. I have my whizbang new RAID-10 setup on zfs working (stripe over mirrors) and all, but I'm curious what y'all think are the most important post setup optimizations you think are:

compression? what exactly is it? When I first heard about it, I thought it just meant compressed files on the filesystem and a lot of overhead, is that it, or is it something you can't live without? Why?

dedup? sounds like a good idea - sort of like git does - with shared blobs that are linked. But, is it a necessity in your opinion? Why?

replication? this one seems tricky to me, but is it? Or is it the greatest thing since sliced bread and everyone should do it?

RAM set asides for stuff - do I need to just leave all my RAM alone to manage itself, or carve out for ZFS?

My use case continues to be mostly serving up files and hosting my own fossil repo, as I've said elsewhere, so not lots of connections, but I'd like things to be fast to read and reasonably fast to write.

I guess what I'm asking is how do you, personally and not as your enterprise sysadmin persona, set up zfs initially. For nigh on a decade I've created the pool and turned off compression and that's about it. It's been flawless perfection. But, now with my upgraded (if aged) hardware (i7 quad core, sas rig, scads of ssh drives and spinners too, and 24 GB memory), I'd like to explore some optimizations that others have found useful. What do you typically do when setting up your zfs pools?

Full disclosure... I'm trying out TrueNAS Core at the moment where I'm sharing the stripe over mirrors pool on the lan, along with a jail running fossil. But, it's FreeBSD underneath and I'm much more familiar with FreeBSD, so whatever applies to it should translate pretty well.

Thanks,

Will
 
Default lz4 compression is a good choice. Do not put compression off, as cpu-time is cheap.
You can set atime to off.
Dedup takes alot of memory & cpu , the advantages are not clear except in special cases
For a database you can put recordsize=16K instead of 128K.
Take snapshots & send them to another drive for backup/recovery.

A mirror is faster compared to zraid.
 
My most important tuning is to turn on compression on the right datasets, and create datasets fine-graded enough to separate out non-compressible things (media files, distfiles etc.).
 
  • Like
Reactions: mer
my basic setup is: atime = off and compression = zstd-9 (which is slower than lz4 but has a higher compression rate; so for speed I would go lz4, for more aggressive compression I would go zstd-X, the other compression algorithms are basically just for compatibility)
 
My most important tuning is to turn on compression on the right datasets, and create datasets fine-graded enough to separate out non-compressible things (media files, distfiles etc.).
Could you elaborate on that? In a workstations/home pc, what datasets would you compress and why?

my basic setup is: atime = off and compression = zstd-9 (which is slower than lz4 but has a higher compression rate; so for speed I would go lz4, for more aggressive compression I would go zstd-X, the other compression algorithms are basically just for compatibility)
Why atime = off?
 
Could you elaborate on that? In a workstations/home pc, what datasets would you compress and why?

I compress everything that I don't know is incompressble. Incompressible is media stuff, already compressed files such as distfiles and encrypted stuff.
 
  • Like
Reactions: mer
Could you elaborate on that? In a workstations/home pc, what datasets would you compress and why?


Why atime = off?
though for your first question I am not directly addressed, however: I compress everything because from various benchmarks (database, mailservers, normal filestorage) I came to the conclusion that it not only brings the benefit of saving space, but also a speedup especially on slow rust (HDDs). A bottleneck is the transfer from disk to memory, so basically if you need to transfer less data it is faster, uncompressing data is usually very fast. So for already compressed data like multimedia files I use lz4 which is very fast in detecting whether the data can be compressed or not.

atime = off: this attribute saves the timestamp of a read-access on a file. I do not need that, I am fine with just knowing when a file was created or changed. If you enable atime and read a file, also a write operation happens to write that metadata, which of course is costly.
 
So, set compression on in the whole /home dataset it is not a good approach?

I have not measured what the damage is from having compression on and then having uncompressable files in there.

I could also imagine that there are systems where compressable, compressed reading and writing is actually slower. Think of a low clocked Xeon but with a bazillion of PCIe lanes all attached to very fast SSDs.
 
So, set compression on in the whole /home dataset it is not a good approach?

Set it to compression=on (lz4 by default currently). There are very few situations where it may make sense to set to off. A server actively saving and serving multiple streams of already compressed video, for example, would save a few cycles by setting it to off.

Unless you've got a very demanding workload dealing with saving data that does not compress, leave it on.
 
compression is "interesting". CPU cycles used to compress blocks to be written, but a smaller block uses less bandwidth to write/read. As said by others, some data does not compress so having it on for that data is wasted CPU, but you also need to look at the data set: have a separate dataset for things like mp4 or other uncompressible data you set the property to off because "on" gets you nothing.
But in general where a dataset like a user home, compression on eventually helps. Some data can't be compressed, some/most will, so over the long run the average is having compression on is good (CPU to compress/decompress is less than transfer time so smaller compressed block is faster).
 
Note ZFS will also not use the compressed blocks when writing a file unless it reduces usage by at least 1/8. So if you’re writing already-compressed data (most media formats), you will have extra overhead (but depending on drive performance, ~not human-noticeable with lz4) during the write, but not on later reads, as the blocks will be sorted un-compressed (by ZFS) on-disk.

In addition, be aware that very small (close to the pool’s block size) files have to be very compressible to actually be stored compressed.

The only situation where I have personally found benefit to disabling compression was in a MythTV setup, where it was streaming multiple pre-compressed high bitrate streams onto and off of the drive(s) simultaneously.
 
Compression costs nanoseconds of CPU time while turning it off results in more blocks written. Writing to disk costs microseconds. I'd rather pay in nanoseconds than microseconds.
OK. So, if I zfs set compression=on mypool or zfs set compression=lz4 mypool, does it compress what's already on the drive or no?
 
It compresses the dataset and all it's children.
Note :
For /var/log you can set compression to gzip-9 as it is highly compressible text.
For /mnt/my_media_files you can set compression to off as e.g.mp4 or flac is already compressed.
 
Back
Top