ZFS Zil tuning for large storage system

Hi
My storage has been tuned to commit the write at every 12 seconds with the following parameter:
vfs.zfs.txg.timeout: 12

However , when during large write let say 40 MB per second, the storage keeps committing data from ZIL to the disk every sec instead of every 12 seconds.

For your information, committing >100 MB data to the storage with mirror will take 1-2 seconds and significantly impact read IO, previously it can be overcome by tuning the vfs.zfs.write_limit_override to minimize the committing process until accumulate large write buffer and flush as once, however the parameters has been removed since FreeBSD 9.3 and 10.1

My storage is running FreeBSD 10.1.
 
It's not writing every 12 seconds. That's a max time before writing and is described in the handbook. By default all updates are made either when 64MB accumulates, the 5 second default hits, or if an administrative command is issued such as snapshot.

Handbook -- https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-advanced.html:
vfs.zfs.txg.timeout - Maximum number of seconds between transaction groups. The current transaction group will be written to the pool and a fresh transaction group started if this amount of time has elapsed since the previous transaction group. A transaction group my be triggered earlier if enough data is written. The default value is 5 seconds. A larger value may improve read performance by delaying asynchronous writes, but this may cause uneven performance when the transaction group is written. This value can be adjusted at any time with sysctl(8).

You may want to look into these tunables:
sysctl -a -d vfs.zfs | grep dirty_data
Code:
vfs.zfs.dirty_data_max: The maximum amount of dirty data in bytes after which new writes are halted until space becomes available
vfs.zfs.dirty_data_max_max: The absolute cap on dirty_data_max when auto calculating
vfs.zfs.dirty_data_max_percent: The percent of physical memory used to auto calculate dirty_data_max
vfs.zfs.dirty_data_sync: Force a txg if the number of dirty buffer bytes exceed this value

You may also find the discussion in thread Thread 47224 relevant.
 
It works, I have made the following changes and it does as what I'm expected.
Code:
vfs.zfs.dirty_data_sync=300000000
 
Back
Top