ZFS Choosing zfs compression?

Hi All,

I was wondering what types of compression if any people are using for different types of data. What would be your recommendations for the following types of data I have.

A 1TB pool that consists of /home - This is a pretty standard home partition, word docs, pictures etc.
A 8TB pool used for Jellyfin - This is nearly all mkv video files, there maybe a few avi files aswell. This is a good example of write once and read many times.
A root filesystem for the freebsd system.

Thanks :)
 
In general, I don't bother too much with this. I'm sure this could have potential for optimisation, but you'd need some quite thorough insight into the data to be stored in the file system.
A 1TB pool that consists of /home - This is a pretty standard home partition, word docs, pictures etc.
A root filesystem for the freebsd system.
For these I'd just leave it at compression=on and move on to more productive activities.
A 8TB pool used for Jellyfin - This is nearly all mkv video files, there maybe a few avi files aswell. This is a good example of write once and read many times.
This one is a bit trickier. Typically this kind of data is pretty heavily compressed already. You could either turn compression off (because it won't lead to any space savings anyway, so why try at all), or use a compression method that can be tuned to higher compression ratios (gzip or zstd, currently) on the premise that if you could get any space savings at least sometimes, it would be with these methods. Mind, however, the latter most likely invalidates the trade-off made by the standard compression (usually negligible cost in cpu time vs. good-enough compression in the general case.)
 
I guessed leaving it as is would be a pretty sane default for the home partition.

As for the 8TB of videos, like you mentioned they are pretty much already compressed as much as they probably are going to be. Seeing as for now at least the drive is only about 60% full then there is not a lot to be gained with even a tiny compression ratio.

I thought getting others thoughts on how they approach different types of date would provide a range of ideas, as not doubt any of use do exactly the same thing (apart from maybe we are all in agreement at how cool ZFS is!)
 
Even if you have uncompressed video, raw photos or wav files of audio, the standard compression algorithms don't compress them much if at all.

I generally just turn compression=on, which means lz4. It is the fastest and lightest. I don't turn it on when I know there's just media floating around a filesystem.

I have been told but cannot verify at this time that zstd has a high overhead before a compressed stream comes out. If that it true it would be important to not use it in situations where probing-and-giving-up is in use. I might convert the filesystem holding my OS drive images to zstd. Dedup and compression are highly effective there.
 
I guessed leaving it as is would be a pretty sane default for the home partition.
You might not be aware of some of ZFS' compression properties. As ZFS compression is a very important functionality, not in the least in its relation to I/O speed, it has some nice properties distinguishing it from the usual compression options . One of those is that it automatically tests compression on the start section of each file. If it finds it being 'uncompressionable' it stops trying and stores the file uncompressed.

Especially for '[...] good example of write once and read many times.' overall, it has hardly any noticable impact when just leave compression enabled.

You might want to have a look further at the ZFS compression and its fairly recent developments such as new compression algorithms: here and its thread discussion in general.
 
Did some 'slight' testing, and poorly at that, a few days ago on up to date 14-stable with i7-3820 (3 cores + threads enabled). Drive was mechanical 2TB Western Digital Black (peak sustained rate at maybe 140MB/s). Timings were manually recorded and various things (bloated or not) are in startup. I didn't take great notes and didn't repeat tests so can't say differences weren't just drive seek latency differences or otherwise.
lz4zstd-9
Full boot BIOS/UEFI: 1:29/1:16Full boot (thought UEFI; notes not clear): 1:22
`find /var/cache/ccache` (25GB ccache4 self-compressed) 135.32122.23
Single user BIOS:15
UEFI boot of older -current(=15) from nvme:11.5
Didn't note what I did but had 11 and 14.86 noted
find ccache printing from ARC alone on a repeat run:
BIOS vt: 5.20
UEFI vt: 4.04 (4.05 on zstd-9 run)
BIOS vt hw.vga.textmode=1: 3.82
BIOS sc: 3.66
redirect output of find to null (don't remember terminal interface): 1.27
If I recall (didn't note it), lz4 was about 1.57TB of data and zstd-18 on backup got down to 1.53TB. I think zstd-9 was around 10GB larger than that. Some datasets were horrible for compression ratio while others were awesome with just LZ4 and only some get a good boost from zstd for their ratio.

Compression overhead adds latency but I don't have a practical sequence to measure that yet. Any compression ratio is additional disk I/O by that amount at the limitation and cost of CPU+RAM + mentioned latency. If data doesn't store compressed then there is no negative impact reading that later; it was stored uncompressed.

I had started throwing together a script to try to measure overhead+throughput of geli and zfs settings by using a memory disk and a 10GB image used by some for compression testing. I didn't get back to it but found geli was easy to crash the system with (probably rapid destroy+create) and zstd read performance was doing best around -12 to -15. You can further impact that with adjusting recordsize (think my test data was best around 256k-512k). Not sure if ZFS lets us benefit from different compression per written record for the same file.

Maybe someday I will analyze performance and space closer which could lead to an optimized install if each file gets written at its best compression + recordsize which with a performance benchmark added to the installer could make a more performant system.

Side note, its kinda fun watch zfs replication reaching ARC compression ratios of 10x and 15x making 4G ARC hold more data than my 32GB RAM.
 
Back
Top