ZFS Advice on what properties should be considered BEFORE creating a pool and datasets?

I am interested in ZFS properties that are problematic to redefine after the creation of zpool/datasets (such as compression property, the effect of which will not apply retrospectively).

It's funny, but I've been using FreeBSD and ZFS for over 3 years now and creating zpool and datasets using the default properties. In the following way:
Code:
zpool create -m none test
zfs create test/test

I didn't even set the compression property although it is known to have negligible CPU overhead, it saves significant disk space in some cases. Currently I do sysctl vfs.zfs.min_auto_ashift=12 before creating a pull (this is another pool-related setting that cannot be changed later).

And I create a dataset with the complession property:
Code:
zfs create -o compression=lz4 test/test

Please, advise what properties should I pay attention to BEFORE creating a pool and datasets so as not to recreate these pools and increased writing on disks?
 
I usually unset properties that are not supported on Linux, so that the pool can be mounted on Linux if the need ever arises. Same for some too-new features.

Code:
zpool create \
    -o feature@log_spacemap=disabled \
    -o feature@vdev_zaps_v2=disabled \
    -o feature@head_errlog=disabled \
    -o feature@zstd_compress=disabled \
    -o feature@zilsaxattr=disabled \
    xcarb3 /dev/da0s4
 
I think for 85% of the use cases the defaults are fine.

That sysctl: I believe that is the current default value; in the beginning (back around 9.x) I think the value was 9 (512 byte blocks) which degraded performance on devices with 4K blocks, especially if initial partitioning did not align correctly.

I don't tune anything, but from what I've read:
  1. Can the dataset benefit from a large block/stripe size?
  2. Are you storing lots of little files or fewer big files?
  3. Is the dataset intended for use by a database application? MySQL or others.
  4. Is the use case something that gets performance boost by turning off compression?
  5. Do you need minimal data safety by setting copies greater than 1?
  6. Do you want to encrypt the dataset?
Item 4 I think is one of the biggest cases to investigate.

My opinions:
Tweaking datasets are one of the biggest advantages of ZFS over UFS. If you could do similar tuning on UFS, you wind up with a whole bunch of partitioning to do, then tuning the file system on that partition.

I agree with the approach by cracauer@ to explicitly disable problematic items.

Some of the discussions around compression/no compression/what algorithm are highly dependent on the data itself and the access patterns to it.
Create a dataset with the defaults, test your workload/use case. Record results.
Create a dataset with modified values, test your workload/use case. Record results.
Compare results, decide on what you want to do.
 
I - when I don't forget - do the opposite: I create a zpool using the -d and/or "-o compatibility=..." and I only enable features that I know I will use OR can be disabled afterward.
See [man 8]zpool-create[/man] and especially [man 7]zfs-features[/man].
Of course I do this before writing to the pool. Also, use "compression=on" unless you really need to specify the compression. If lz4 is enabled it will be used, if it's not it falls back to earlier compressions. The default has changed in the past and probably will do so in the future but everything is governed by the features enabled on the pool.
 
  • Thanks
Reactions: dnb
My sole use of ZFS is as a movie server.
Blu-Ray and 4k UHD create absolutely huge files, so I have their directories set to "no compression" to reduce serving over head.

I'm wondering if the endless tinkering with ZFS is ill-founded... searching for a cure to a problem that doesn't exist.
For example, are there any bona fide benchmarks of compression vs no-compression for actual file serving performance?
 
I'm wondering if the endless tinkering with ZFS is ill-founded... searching for a cure to a problem that doesn't exist.
For example, are there any bona fide benchmarks of compression vs no-compression for actual file serving performance?

Well, the desire to keep your pool mountable by different systems in the future is real I suppose.

Compression, if you have compressible files, is a large speedup. Whether you can serve that over the network is a different matter.
 
Compression, if you have compressible files, is a large speedup. Whether you can serve that over the network is a different matter.
It always comes down to Bandwidth.
If you are ftp or scp a file from a remote server do you take the uncompressed one or the compressed one?
Assuming compressed one is smaller (less bytes) it should download quicker. But "time of quicker download plus time to uncompress" vs "time to download uncompressed" which is greater and by how much?
Compressed blocks read into system buffers decompressed and made available to the application: similar Bandwidth question.

Well, the desire to keep your pool mountable by different systems in the future is real I suppose.
This is critical if one has a reason to mount the pool across systems. User specific cases.

Blu-Ray and 4k UHD create absolutely huge files, so I have their directories set to "no compression" to reduce serving over head.
This is an assumption of "read a block from dataset, decompress, send block over network", yes? I think that is a reasonable assumption, but maybe not guaranteed?
 
  • Thanks
Reactions: dnb
For example, are there any bona fide benchmarks of compression vs no-compression for actual file serving performance?
For data like yours (already compressed video) the compression will abort while writing for being unable to compress. Once written, it will be identical to as if it has been written without compression, so no impact when serving out.
 
For data like yours (already compressed video) the compression will abort while writing for being unable to compress.
And as a follow-up, if you have a dataset where it will only be holding pre-compressed (movies, pictures) data, disabling compression will give you a small performance improvement when writing data. (Doesn’t have to try & fail to compress.)

Most of the time, it’s not worth worrying about. I had a system with multiple TV tuners which would often be streaming multiple compressed (live off the tuners) streams to disk while simultaneously serving out a stream. For this setup, avoiding that throw-away overhead was noticeable.

EDIT: Reflecting on this, it must have been tweaking record sizes that was noticeable. I believe that once it decides a file is incompressible, it doesn’t try again for the rest of the file - and I think it was just continually appending while saving a stream. Moral of the story: everything depends on your workload. Making a change and seeing the results with your workload is the only performance that matters.
 
Back
Top