Did some 'slight' testing, and poorly at that, a few days ago on up to date 14-stable with i7-3820 (3 cores + threads enabled). Drive was mechanical 2TB Western Digital Black (peak sustained rate at maybe 140MB/s). Timings were manually recorded and various things (bloated or not) are in startup. I didn't take great notes and didn't repeat tests so can't say differences weren't just drive seek latency differences or otherwise.
| lz4 | zstd-9 |
| Full boot BIOS/UEFI: 1:29/1:16 | Full boot (thought UEFI; notes not clear): 1:22 |
| `find /var/cache/ccache` (25GB ccache4 self-compressed) 135.32 | 122.23 |
Single user BIOS:15
UEFI boot of older -current(=15) from nvme:11.5 | Didn't note what I did but had 11 and 14.86 noted |
find ccache printing from ARC alone on a repeat run:
BIOS vt: 5.20
UEFI vt: 4.04 (4.05 on zstd-9 run)
BIOS vt hw.vga.textmode=1: 3.82
BIOS sc: 3.66
redirect output of find to null (don't remember terminal interface): 1.27
If I recall (didn't note it), lz4 was about 1.57TB of data and zstd-18 on backup got down to 1.53TB. I think zstd-9 was around 10GB larger than that. Some datasets were horrible for compression ratio while others were awesome with just LZ4 and only some get a good boost from zstd for their ratio.
Compression overhead adds latency but I don't have a practical sequence to measure that yet. Any compression ratio is additional disk I/O by that amount at the limitation and cost of CPU+RAM + mentioned latency. If data doesn't store compressed then there is no negative impact reading that later; it was stored uncompressed.
I had started throwing together a script to try to measure overhead+throughput of geli and zfs settings by using a memory disk and a 10GB image used by some for compression testing. I didn't get back to it but found geli was easy to crash the system with (probably rapid destroy+create) and zstd read performance was doing best around -12 to -15. You can further impact that with adjusting recordsize (think my test data was best around 256k-512k). Not sure if ZFS lets us benefit from different compression per written record for the same file.
Maybe someday I will analyze performance and space closer which could lead to an optimized install if each file gets written at its best compression + recordsize which with a performance benchmark added to the installer could make a more performant system.
Side note, its kinda fun watch zfs replication reaching ARC compression ratios of 10x and 15x making 4G ARC hold more data than my 32GB RAM.