ZFS Advice on what properties should be considered BEFORE creating a pool and datasets?

dnb · Jun 17, 2024

I am interested in ZFS properties that are problematic to redefine after the creation of zpool/datasets (such as compression property, the effect of which will not apply retrospectively).

It's funny, but I've been using FreeBSD and ZFS for over 3 years now and creating zpool and datasets using the default properties. In the following way:

Code:

zpool create -m none test
zfs create test/test

I didn't even set the compression property although it is known to have negligible CPU overhead, it saves significant disk space in some cases. Currently I do sysctl vfs.zfs.min_auto_ashift=12 before creating a pull (this is another pool-related setting that cannot be changed later).

And I create a dataset with the complession property:

Code:

zfs create -o compression=lz4 test/test

Please, advise what properties should I pay attention to BEFORE creating a pool and datasets so as not to recreate these pools and increased writing on disks?

cracauer@ · Jun 17, 2024

I usually unset properties that are not supported on Linux, so that the pool can be mounted on Linux if the need ever arises. Same for some too-new features.

Code:

zpool create \
    -o feature@log_spacemap=disabled \
    -o feature@vdev_zaps_v2=disabled \
    -o feature@head_errlog=disabled \
    -o feature@zstd_compress=disabled \
    -o feature@zilsaxattr=disabled \
    xcarb3 /dev/da0s4

mer · Jun 17, 2024

I think for 85% of the use cases the defaults are fine.

That sysctl: I believe that is the current default value; in the beginning (back around 9.x) I think the value was 9 (512 byte blocks) which degraded performance on devices with 4K blocks, especially if initial partitioning did not align correctly.

I don't tune anything, but from what I've read:

Can the dataset benefit from a large block/stripe size?
Are you storing lots of little files or fewer big files?
Is the dataset intended for use by a database application? MySQL or others.
Is the use case something that gets performance boost by turning off compression?
Do you need minimal data safety by setting copies greater than 1?
Do you want to encrypt the dataset?

Item 4 I think is one of the biggest cases to investigate.

My opinions:
Tweaking datasets are one of the biggest advantages of ZFS over UFS. If you could do similar tuning on UFS, you wind up with a whole bunch of partitioning to do, then tuning the file system on that partition.

I agree with the approach by cracauer@ to explicitly disable problematic items.

Some of the discussions around compression/no compression/what algorithm are highly dependent on the data itself and the access patterns to it.
Create a dataset with the defaults, test your workload/use case. Record results.
Create a dataset with modified values, test your workload/use case. Record results.
Compare results, decide on what you want to do.

Cath O'Deray · Jun 17, 2024

mer said:
Do you want to encrypt the dataset?

That before all else, I think.

malavon · Jun 17, 2024

I - when I don't forget - do the opposite: I create a zpool using the -d and/or "-o compatibility=..." and I only enable features that I know I will use OR can be disabled afterward.
See [man 8]zpool-create[/man] and especially [man 7]zfs-features[/man].
Of course I do this before writing to the pool. Also, use "compression=on" unless you really need to specify the compression. If lz4 is enabled it will be used, if it's not it falls back to earlier compressions. The default has changed in the past and probably will do so in the future but everything is governed by the features enabled on the pool.

bgavin · Jun 17, 2024

My sole use of ZFS is as a movie server.
Blu-Ray and 4k UHD create absolutely huge files, so I have their directories set to "no compression" to reduce serving over head.

I'm wondering if the endless tinkering with ZFS is ill-founded... searching for a cure to a problem that doesn't exist.
For example, are there any bona fide benchmarks of compression vs no-compression for actual file serving performance?

cracauer@ · Jun 17, 2024

bgavin said:
I'm wondering if the endless tinkering with ZFS is ill-founded... searching for a cure to a problem that doesn't exist.
For example, are there any bona fide benchmarks of compression vs no-compression for actual file serving performance?

Well, the desire to keep your pool mountable by different systems in the future is real I suppose.

Compression, if you have compressible files, is a large speedup. Whether you can serve that over the network is a different matter.

mer · Jun 17, 2024

cracauer@ said:
Compression, if you have compressible files, is a large speedup. Whether you can serve that over the network is a different matter.

It always comes down to Bandwidth.
If you are ftp or scp a file from a remote server do you take the uncompressed one or the compressed one?
Assuming compressed one is smaller (less bytes) it should download quicker. But "time of quicker download plus time to uncompress" vs "time to download uncompressed" which is greater and by how much?
Compressed blocks read into system buffers decompressed and made available to the application: similar Bandwidth question.

cracauer@ said:
Well, the desire to keep your pool mountable by different systems in the future is real I suppose.

This is critical if one has a reason to mount the pool across systems. User specific cases.

bgavin said:
Blu-Ray and 4k UHD create absolutely huge files, so I have their directories set to "no compression" to reduce serving over head.

This is an assumption of "read a block from dataset, decompress, send block over network", yes? I think that is a reasonable assumption, but maybe not guaranteed?

Eric A. Borisch · Jun 17, 2024

bgavin said:
For example, are there any bona fide benchmarks of compression vs no-compression for actual file serving performance?

For data like yours (already compressed video) the compression will abort while writing for being unable to compress. Once written, it will be identical to as if it has been written without compression, so no impact when serving out.

Eric A. Borisch · Jun 18, 2024

Eric A. Borisch said:
For data like yours (already compressed video) the compression will abort while writing for being unable to compress.

And as a follow-up, if you have a dataset where it will only be holding pre-compressed (movies, pictures) data, disabling compression will give you a small performance improvement when writing data. (Doesn’t have to try & fail to compress.)

Most of the time, it’s not worth worrying about. I had a system with multiple TV tuners which would often be streaming multiple compressed (live off the tuners) streams to disk while simultaneously serving out a stream. For this setup, avoiding that throw-away overhead was noticeable.

EDIT: Reflecting on this, it must have been tweaking record sizes that was noticeable. I believe that once it decides a file is incompressible, it doesn’t try again for the rest of the file - and I think it was just continually appending while saving a stream. Moral of the story: everything depends on your workload. Making a change and seeing the results with your workload is the only performance that matters.

Mirror176 · Jul 2, 2024

ashift needs to be set at creation time. Dataset encryption also needs to be set at creation time. Other compatibility options should be set to avoid certain features from being used if compatibility with other systems is desired. Some settings like recordsize cannot be altered (much) by zfs send+recv; you will have to rewrite the file contents to rewrite it in a newly selected sector size. If it is a backup drive that should be read only, it should be set sooner rather than later. copies, reduntant_metadata, and checksums are a reliability thing you should consider setting at the start. atime should be disabled if unneeded to avoid an extra write per file read + fragmentation that comes with those scattered locations they go to. Changing exec and setuid permissions means you should review if you need to remove snapshots for security reasons. primarycache/secondarycache not being set as desired could pollute arc or cause extra i/o until set properly but will sort itself out.. refreservation, reservation, refquota, *_limit, *quota* settings need to be set before limits are exceeded. That is a lot of settings with more that are for compatibility specific things outside FreeBSD, but you likely do not 'need' to adjust all of them.

I haven't found good benchmarks and have tried unsuccessfully to put a script together to test various changes (sectorsize, geli encryption settings, zfs ashift and recordsizes, zfs compression, etc.) but so far its not too useful/automated. I was starting with creating a couple memory disks without zfs caching to try to measure impacts of changing those types of things without the impact that arc can have. So far I found it is pretty easy to get errors on geli and start locking zfs commands with rapid create+use+destroy cycles. If anyone knows of good tests I can run to compare geli/gbde/zfs encryption and other settings I'd be glad to hear them since its akward looking at `time cp stuff* destination` and seeing the same fraction of a second runtime when it is stuck (literally doesn't return that quick) compressing+writing due to zstd-18 taking many seconds. Otherwise I will continue to cobble together some results but can't have accurate things like RAM or CPU time. What I would really like is if we can get it down to the point of being a benchmark that can be added to bsdinstall to help users see impacts of options on their specific hardware for CPU/RAM/disk without getting out a stopwatch and weeks of reinstalling over and over.

Compression usually is fast to abort if compression appears to not be successful enough. If it couldn't be compressed then it won't be decompressed on read. Depending on the algorithm and hardware used + the data processed changes how fast it will skip compression, compress successfully, and decompress. lz4 should normally be faster to compress than some of the faster single consumer drives and zstd compression should beat out a magnetic drive's throughput; decompression on both is faster than that. Any compressed bytes are basically free additional disk bandwidth if the transfer was otherwise still disk bottlenecked but at the cost of CPU+RAM. Testing should be done as more is not always better; higher compression settings make some data bigger, some transfers slower, and larger recordsizes cause some data to increase in size while others decrease. Already compressed data such as movies may have small parts that the codec happens to come up with a smaller match for and some files have content such as metadata that can be further squeezed down. I thought I recall disk metadata and arc gets impacted by the compression setting too.