I *think* what you are looking for are the various sysctls for dirty data, respectively vfs.zfs.dirty*
and vfs.zfs.vdev.async_write_active_[min|max]_dirty_percent
.
I suspect deleting a big amount of data fills up the max amount of dirty data ("in flight" writes, not yet committed to disk) allowed, so zfs is constantly trying to get the TXGs committed to disk.
The default values usually work at least "good enough" except for some really extreme edge cases and should never cause the problems you are seeing. So I highly suspect there is another root cause - e.g. a dying disk that degrades pool performance or heavy memory pressure on the system.
I write and delete large disk images with well over 100GB (full disk backups of some clients) on our storage server and never saw anything like the behaviour you described. Even my desktop machine with much less RAM and disks never got unresponsive when I bashed its single ZFS pool with similar tasks.
If you still want/need to adjust some knobs of ZFS, there is no "master recipe" on how to adjust these (or other zfs-related) sysctls - you have to carefully monitor the system behaviour under load to understand where the bottleneck is. Dtrace is your very best friend for this. I can *highly recommend you read the sections on "Performance" and "Tuning" in "FreeBSD Mastery: Advanced ZFS" by Michael W. Lucas and Allan Jude. They provide you with a structured method on how to identify performance bottlenecks as well as some example Dtrace-scripts that can be adjusted to your needs. The dtrace-toolkit (available from ports and packages) also has a lot of zfs-, disk- and i/o-related scripts that can help narrowing down the exact bottleneck.
That being said, I still suspect there is a much easier solution to your problem - so you should start from a high level and narrow down the true root cause.
What layout has the pool you are seeing this behaviour? Does the pools ashift size fit the drives blocksize?
Any errors reported by zpool status
? Memory throttle counts reported by zfs-stats -A
?
During deletion of large files, try monitoring the pool with zpool iostat -v 1
- do the "operations" and "bandwidth" numbers look plausible and are they relatively evenly distributed across all vdevs and providers? As said - a single dying or misbehaving drive can send the performance of the whole pool into the abyss. SSDs that have reached their max wear level (or min wearout indicator for intel) are notorious for this because they tend to throttle back to sub-1MiB/s throughput levels.[/cmd]