Auto (ZFS) Install - why is compression enabled by default?

Compression is at the very foundation of ZFS.
It is enabled by default because it enhances the efficiency and performance of file systems and storage.
Since it reduces the amount of disk space needed to store data, it improves read/write performance in some cases. (As data takes up less space, there is less data to read and write, and since the time needed to compress and decompress data is less than the time saved for read/write operations, this results in a performance increase)

Given that our current CPUs are very efficient at these types of tasks, the overhead of compression is negligible compared to the performance gain. It also saves bandwidth during network transfers.
 
Compression is at the very foundation of ZFS.
It is enabled by default because it enhances the efficiency and performance of file systems and storage.
Since it reduces the amount of disk space needed to store data, it improves read/write performance in some cases. (As data takes up less space, there is less data to read and write, and since the time needed to compress and decompress data is less than the time saved for read/write operations, this results in a performance increase)

Given that our current CPUs are very efficient at these types of tasks, the overhead of compression is negligible compared to the performance gain. It also saves bandwidth during network transfers.
File storage vs. CPU vs. Memory - I'm not sure burning up CPU cycles and RAM for decompressing your OS base file system is such a good idea. I do see CPU and memory utilized for this purpose off of a really tight, compact boot up. overhead is not negligible vs. ram used and cpu used - it can be measured well over 1% and is not and should not be required for out the box raw setup.

For large data sets, of actual data, not executable code - compression is useful.
 
I also was skeptical at first but it should be the default for any machine with 4 or more CPU's. Compression is really cheap. That's why I enable it not just on filesystems but also on memory in Linux (zram as swap).
 
It might seem paradoxical at first, but the choice is very justified.

The decision to use ZFS indeed makes sense when managing significant datasets, saving space and extending the lifespan of storage media.
However, the approach to enable compression by default is the result of a thoughtful compromise between resource utilization and the benefits in terms of performance and efficiency. Even though it's always possible to customize and adjust according to specific needs, ZFS offers a lot of possibilities in this respect.

Moreover, although compression is enabled by default, ZFS is intelligent: if the data does not compress well (as is often the case with already compressed files or certain types of multimedia files), ZFS can choose not to compress it, thus reducing unnecessary overhead.

In the vast majority of cases, it is therefore recommended to leave compression enabled by default.

In my view, the cases where it is necessary to forego compression are less frequent today (low CPU, severe memory constraints, files already compressed...).
 
I also was skeptical at first but it should be the default for any machine with 4 or more CPU's. Compression is really cheap. That's why I enable it not just on filesystems but also on memory in Linux (zram as swap).
It's not about skepticism - The fact is FreeBSD runs out of the box 100 MB or so in ram, which is the comparison, again I would have to ask why the root executable code would need to be compressed and uncompressed etc. It makes no sense and has no benefit to the comparison at all for base install.

ZFS may be 'intelligent' but it's only as intelligent as the implementation and turning on compression on the root os is not a good use and or implementation. It fits in the realm of waste, bloat in terms of relativity etc. which I'm guessing could be easily fixed by simply not having it on. Reminds me of different garabge collection strategies, where they all have use cases, benefits, and also side effects. Apply to all and everythings is usually a bad idea.

Install FreeBSD, run Top and you will see the extra, unneeded used memory which is not required. I'm not sure what purpose compression serves in that capacity. Perhaps better left to home directories and or data sets, but definitely not boot code processes.

ZFS defaults to leaving compression off for good reason - comes down to choice - want it on, turn it on. The processes on bootup are in memory directly and more than likely do not have to be read off the disk pulling again from the resident ZFS cache (compressed) a second time throughout the lifecycle of the the OS running - wasted memory could be easily discarded or flushed imo or not used to begin with.

It would be interesting to tweak the installer to turn off compression and see the difference of how the OOB default experience operates in terms of relativity.
 
Bandwidth utilization. The pipe to/from the physical devices is fixed in size, data compressed before pushing it onto the pipe gives you effectively more utilization.
Or for a give size of data it goes across the pipe quicker which gives overall system performance increase.
So while there may be measurable increase in cpu and ram utilization (decrease in overall system performance), often the overall system performance is improved because reads and writes to devices finish quicker.
The compression/decompression should only be happening on blocks written/read from the devices.
 
It fits in the realm of waste, bloat in terms of relativity etc. which I'm guessing could be easily fixed by simply not having it on.
No. Even SSDs These days deliver data at a snails pace when viewed from the CPU memory bus. You may need a second to read, say, one GB of data. Now you read one GB of compressed data and decompress in 0.1 seconds to two GB. That's a nice improvement and no waste at all.

What makes you so sure that everybody who handled this is clueless w.r.t. this matter and you know better?
 
The trade-off between extra resources versus transfer speed between secondary memory (hdd's and the like) and primary memory (i.e. RAM), and disk space savings also depends on the compression algorithm used; this isn't hardcoded: see zfsprops(7). The current default, LZ4 compression on, works quite well, it has very little impact on CPU load and is almost always beneficial. More info:
From its conclusion:
The potential performance benefits of compression are significant, and the worst-case penalties are quite small. There are almost always better places to focus a sysadmin or storage architect’s attention than picking and choosing which datasets to disable compression on, so we recommend defaulting to LZ4 by setting “compress=lz4” at the pool root and leaving that value inheritable.

There are other interesting compression algorithms available such as the in OpenZFS 2.0 added Zstandard (Zstd for short). It has various compression levels that can be set; notably: the decompression speed is independent of the compression level. In its decompression speed it beats LZ4 (see #1 below). More info:
  1. Zstandard Compression in OpenZFS, FreeBSD Journal • March/April 2020, by Allan Jude
  2. ZSTD Compression, OpenZFS Developer Summit 2017, by Allan Jude - video - slides
  3. Implementing ZSTD in OpenZFS on FreeBSD, BSDCan 2018, by Allan Jude - video - slides



[...] The compression/decompression should only be happening on blocks written/read from the devices.
Some extra details: currently data in the ZFS ARC is stored compressed by default*; this can be disabled by setting the tunable compressed_arc_enabled=0. There is an open issue Should compressed ARC be mandatory? about this part of the "compression chain".

___
* that used to be not the case: ondisk/l2arc/arc compression

Edit: Zstandard info added
 
Last edited:
OK, so the boot process which ends with 100 MB of data is originally read off the SSD into to ZFS cache (in memory), uncompressed that data in memory, and delivered to memory to be executed. In the background it seems to shrink and grow and use CPU that's not required --- So following that pipeline, which is what I am talking about - I think compression is useful for a number of reasons - I just don't think the core boot-up it's useful and just a waste. It doesn't matter how intelligent ZFS is, it has to track and scan those initial files which will probably never be loaded again or won't benefit from the benefits of compression.

Reminds me of perhaps different garbage collection schemes, some are useful, some are not. I don't think everyone is clueless - again would be interesting to test it out. OOB experience in my opinion, should be idle.

One of the amazing things I loved about FreeBSD is it was the only OS (in comparison to linux, windows, mac etc. in general) that could sit idle and literally do nothing out of the box - zero network connectivity etc. even with network adapters enabled - from that perspective, I think it would be an added bonus :)
 
Erichans - I don't have much time to look through the articles, perhaps later tonight, but thank you for the specific information. I'm not a freebsd pro by any stretch, but I do enjoy how nicely / cohesively it is put together. It's my preference. This just happens to be one of my gripes with ZFS for root - given how minimalistic the install is without using little to no resources.
 
I just don't think the core boot-up it's useful and just a waste.
I don't disagree with this but would point out:
I think it's a one time waste amortized over the entire uptime of the system. Longer the system stays up, the less it matters.
Going from memory here:
Compression is by dataset, you can turn it off at any time, but it will not affect any data written prior, so maybe one can clone the ZFS BE to a non compressed dataset?
Or maybe a way to set the default compression property to off? This gets interesting when "is this property inherited across datasets"
Based on the fact that it doesn't change anything already written one can get wrapped up in "on or off for this dataset serving this application".

A minor detour, that is actually related:
Lots of embedded systems the root filesystem is compressed, mostly because of limited storage space and the bootloader takes care of uncompressing and relocating things. Again, when one has uptimes of months or years, does an extra 30 secs or 1 minute at boot matter? (Yes because it takes 1 minute more to be back in service, but that's 3 sips of coffee)
 
There's nothing new in compression of the data on the fly before saving it on the media. It's so old that the first LTO has offer it and now it's also used in the NAND where the firmware compress the data to save cell write cycles. Nowadays the cpu cycles are cheap and you can compress the data at very little cost.

Here's some info about this from the SandForce which use such compression on they disks.
 
The uniform approach in managing compression seems to me a good way to balance the benefits and challenges.
Having compression enabled by default while allowing adjustments at the dataset level offers a wide margin for customization without introducing excessive complexity.
This allows for a coherent vision.
One should not view compression solely from the perspective of system boot, but rather in terms of efficiency and overall performance in the long term.
We are talking about servers that often benefit from extended periods of activity, and in this context, the initial drawbacks seem to be completely outweighed by this choice.
 
[...] This just happens to be one of my gripes with ZFS for root - given how minimalistic the install is without using little to no resources.
It's undeniable that ZFS takes up extra resources*. However, if you're confronted with a minimalistic system that is based on reasonably current hardware, my take is that probably one the most often encountered limiting resource is the amount of available RAM (not CPU power; note FreeBSD 32-bit support is dimishing). If RAM is an impeding factor, then you probably need to (heavily) tune your ZFS set-up; as a practical matter the obvious option to consider is not to use ZFS, but instead use venerable UFS :)

___
* given its "founding design principles" I'd say that it's not squandering resources, but from its design point of view and implementation it is not aimed at minimalistic systems.
 
The uniform approach in managing compression seems to me a good way to balance the benefits and challenges.
Having compression enabled by default while allowing adjustments at the dataset level offers a wide margin for customization without introducing excessive complexity.
This allows for a coherent vision.
One should not view compression solely from the perspective of system boot, but rather in terms of efficiency and overall performance in the long term.
We are talking about servers that often benefit from extended periods of activity, and in this context, the initial drawbacks seem to be completely outweighed by this choice.
"This allows for a coherent vision."

That I could understand and in principal, agree with.
 
My use is entirely 20 to 50 gb movies already compressed in H264 or H265 format so I have compression disabled on those pools.
In such a special case, yep, compression doesn't buy anything. It's still very unlikely to hurt performance, decompressing is done "on the fly" while reading. It's just a bit of unnecessary CPU work going on. Chances are it will just idle a bit more when you disable compression ... but then, you might save a bit of energy (which is a good thing as well) ;)
 
It's undeniable that ZFS takes up extra resources*. However, if you're confronted with a minimalistic system that is based on reasonably current hardware, my take is that probably one the most often encountered limiting resource is the amount of available RAM (not CPU power; note FreeBSD 32-bit support is dimishing). If RAM is an impeding factor, then you probably need to (heavily) tune your ZFS set-up; as a practical matter the obvious option to consider is not to use ZFS, but instead use venerable UFS :)

___
* given its "founding design principles" I'd say that it's not squandering resources, but from its design point of view and implementation it is not aimed at minimalistic syste
 
I'm not concerned either way... my NAS is a big Xeon with lots of ECC ram and many processors.
Then again, I was an ASM programmer in the beginning, so thrifty consumption of CPU cycles is in my nature.
 
If the data wasn't compressible enough (zfs requires 1/8th savings if I recall), then the records are not decompressed on read so there is no additional CPU or RAM overhead. ZFS should compress in chunks up to the recordsize so if an 'already compressed' file contains enough easily compressible content in it (metadata, random luck, etc.) then 'some' records may get it while others don't. When any records were compressible, then the overhead comes into play for decompressing just the records that were compressed.

My over 10 year old i7-3820 gets to 3,814MB/s for reads and 580MB/s writes for the lz4 test of benchmarks/lzbench and although that is a better lz4 than ZFS currently bundles, it still means a 1,200MB/s Samsung NVMe reads data faster without fully loading a CPU to do so if all records in a read were compressed and both reads and writes to magnetic drives are accelerated as long as I have very few CPU cycles to spare since speeds are likely maxed at about 280MB/s for the drive but more realistically fall around 0.1MB/s to 20MB/s for many workloads. Of course compression doesn't speed up seek times from those slow magnetic drives but it can reduce how much we get from the total throughput.

zstd compression slows down reads on that benchmark to 895MB/s-1,035MB/s depending on the level chosen but writes fall to 2.97-342MB/s so for reads on magnetic it is still good, NVMe may be catching up (have to look closer into testing for if threading was equally expressed, how many records were vs were not compressed, etc.) to see if there is any gain in total throughput with the high CPU use, and magnetic still always gains for reads at higher CPU use while writes are likely bottlenecked by the CPU at high settings so just good for archival at high settings.

ZFS Arc benefits from data being read as compressed to take less RAM at the expense of CPU to read it; read the line under the "ARC:" line in top to understand how much impact it would have currently had for amount of RAM saved. ARC is designed to grow with most disk use and shrink as programs need the RAM and there are limits that can be adjusted for it.

You can always toggle off compression in the installer or later if you do not like it. Toggling it later only applies to newly written files so your currently compressed content will not change until you 'replace' it through upgrades, manual replacement, etc. There is also zle as a compression option to just avoid writing strings of zeros to disk
 
You can always toggle off compression in the installer or later if you do not like it. Toggling it later only applies to newly written files so your currently compressed content will not change until you 'replace' it through upgrades, manual replacement, etc. There is also zle as a compression option to just avoid writing strings of zeros to disk
That's what I remember that toggling only applies to newly written files, currently compressed files will not change. This would be good enough reason for me to have it off by default and maybe some testing. It's been a while since Ive done anything with ZFS ... Thanks for the information.
 
Can you provide actual numbers/benchmarks/flamegraphs/... that an uncompressed root dataset performs better at boot than with compression turned on?

I'm sure if this is really the case you can file a PR and it will be considered to turn off compression if it really gives an advantage.
 
I've watched my memory grow from a 100mb to 160 mb sitting idle. Nothing is loaded, fresh install , no debug ... I haven't looked into it, but I've also watched arc grow as well. I'm sure there are tunaeable parameters. Relative to 64GB of ram, I'm not sure the relevance or 8GB or even 4 GB. Perhaps the issue is not related to compression, perhaps it is. I haven't investigated it enough. Fresh install, only logging in and running top. The memory footprint increase comes at a later time, I'm assuming it's ZFS related. Again will have to look at it. Anyway thanks for the information! I'm sure for bootup and an idle state some fine tuning could be done.
 
There are cron jobs which may run after a reboot. The locate.db for example crawls the entire file system.
 
Back
Top