ZFS Reading on ZFS extremely slow

… OpenZFS now uses a connecting "dot" instead of a connecting underscore: …

Loosely speaking (it may be that some cases are not yet fixed):
  • . for consistency
  • plus _ for legacy.
For example:






Side note: FreeBSD bug 218538 – tuning(7) should either be removed or strictly maintained.
 
Sorry, I could have been clearer.
No problem. Documentation could indeed do with some updates, thanks for reporting that.

IF the Handbook decides to document such tunables, a mention about what "0" means would be nice; especially since it is also used as an output value of the tunable. It would also be nice if it was documented somewhere that zfs ARC max is exposed through the (kernel) parameter zfs_arc_max and through the tunable vfs.zfs.arc.max; especialy since it used to be vfs.zfs.arc_max.

We have lost vfs.zfs.arc_free_target as a tunable. It seems to have gone underground: no tunable for arc_free_target (arc_os.c - line 71-74) The Advanced ZFS book writes about it in the section about ARC tuning. It's described as an alternative to tuning with arc_max (arc_min). By its description, I see it as a better tuning start point then arc_max because:
  • the enforcing mechanism is different
  • it can be adjusted at run time
vfs.zfs.arc_max & vfs.zfs.arc_min is reported not to be tunable on the fly in the Advanced ZFS book. That aspect has changed somewhat it seems.
 
Last edited:
See also: <https://forums.freebsd.org/posts/551519>



… We have lost vfs.zfs.arc_free_target as a tunable. It seems to have gone underground: …

Code:
% zfs version
zfs-2.1.99-FreeBSD_g17b2ae0b2
zfs-kmod-2.1.99-FreeBSD_g17b2ae0b2
% sysctl vfs.zfs.arc.sys_free
vfs.zfs.arc.sys_free: 0
% sysctl vfs.zfs.arc_free_target
vfs.zfs.arc_free_target: 86267
% sudo sysctl vfs.zfs.arc.sys_free=100000
grahamperrin's password:
vfs.zfs.arc.sys_free: 0 -> 100000
% sudo sysctl vfs.zfs.arc.sys_free=0
vfs.zfs.arc.sys_free: 100000 -> 0
% sudo sysctl vfs.zfs.arc_free_target=256000
vfs.zfs.arc_free_target: 86267 -> 256000
% sudo sysctl vfs.zfs.arc_free_target=86267
vfs.zfs.arc_free_target: 256000 -> 86267
% uname -aKU
FreeBSD mowa219-gjp4-8570p-freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #5 main-n253627-25375b1415f-dirty: Sat Mar  5 14:21:40 GMT 2022     root@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 1400053 1400053
%

Cross-reference <https://discord.com/channels/727023752348434432/757305697527398481/949845873641267201>:

<https://github.com/openzfs/zfs/comm...89dd6d116ebd50071673f6e5146f1ac290882R71-R101> (2020-04-15) began:

/* * We don't have a tunable for arc_free_target due to the dependency on * pagedaemon initialisation. */

Allan or anyone: please, is that comment redundant?

<https://forums.freebsd.org/posts/558971> if I'm not mistaken, there's tuning.

Postscript

Allan Jude helped me to understand that what I tuned was not a tunable. In the context of the code comment, tunable is a noun; "… basically a special type of sysctl that gets its initial value from the kernel environment (set by loader)".
 
Thanks for looking into this and Allan Jude's explanation.

The documentation in the Handbook could do with some updating, especially when introducing vfs.zfs.arc_max (& min) and then not describing that "0" represents the default value. Not mentioning that the tunable has changed name (because of the changed internal source code tree structure) from vfs.zfs.arc_max to vfs.zfs.arc.max doesn't help either in clarifying things.

Great to know that vfs.zfs.arc_free_target is still usable as a knob to turn on. I wasn't able to fully appreciate the comment I was quoting in message #53; I hadn't tried to look at the source code. The kernel parameter arc_free_target doesn't have a tunable, it is itself derived from other tunables if I understand correctly. As such the sysctl vfs.zfs.arc_free_target cannot be set by the loader (=it cannot be set in /boot/loader.conf).
 
Last edited:
… cannot be set by the loader (=it cannot be set in /boot/loader.conf).

I stumbled across a bookmarked tutorial, from 2019, that led indirectly to this:

Is there any way to confirm that whether they are kernel-tunables or sysctl-variables?

Yes, it is a flag question, you could install sysutils/nsysctl (>= 1.1) [1]:

% nsysctl -aNG | grep elantech

you can read the comments of sys/sysctl.h for a description of the flags (if you like a GUI: deskutils/sysctlview [2] has a window for the flags and Help->Flags for a description)

[1] nsysctl tutorial
[2] sysctlview screenshots

nsysctl(8)

So, for example, vfs.zfs.arc_free_target near the head of this list:

Code:
% nsysctl -NG vfs.zfs | grep -v \ TUN | sort
vfs.zfs.anon_data_esize:  RD MPSAFE
vfs.zfs.anon_metadata_esize:  RD MPSAFE
vfs.zfs.anon_size:  RD MPSAFE
vfs.zfs.arc_free_target:  RD WR RW MPSAFE
vfs.zfs.crypt_sessions:  RD MPSAFE
vfs.zfs.l2arc_feed_again:  RD WR RW MPSAFE
vfs.zfs.l2arc_feed_min_ms:  RD WR RW MPSAFE
vfs.zfs.l2arc_feed_secs:  RD WR RW MPSAFE
vfs.zfs.l2arc_headroom:  RD WR RW MPSAFE
vfs.zfs.l2arc_noprefetch:  RD WR RW MPSAFE
vfs.zfs.l2arc_norw:  RD WR RW MPSAFE
vfs.zfs.l2arc_write_boost:  RD WR RW MPSAFE
vfs.zfs.l2arc_write_max:  RD WR RW MPSAFE
vfs.zfs.l2c_only_size:  RD MPSAFE
vfs.zfs.mfu_data_esize:  RD MPSAFE
vfs.zfs.mfu_ghost_data_esize:  RD MPSAFE
vfs.zfs.mfu_ghost_metadata_esize:  RD MPSAFE
vfs.zfs.mfu_ghost_size:  RD MPSAFE
vfs.zfs.mfu_metadata_esize:  RD MPSAFE
vfs.zfs.mfu_size:  RD MPSAFE
vfs.zfs.mru_data_esize:  RD MPSAFE
vfs.zfs.mru_ghost_data_esize:  RD MPSAFE
vfs.zfs.mru_ghost_metadata_esize:  RD MPSAFE
vfs.zfs.mru_ghost_size:  RD MPSAFE
vfs.zfs.mru_metadata_esize:  RD MPSAFE
vfs.zfs.mru_size:  RD MPSAFE
vfs.zfs.super_owner:  RD WR RW MPSAFE
vfs.zfs.vdev.cache:  RD WR RW
vfs.zfs.version.acl:  RD MPSAFE
vfs.zfs.version.ioctl:  RD MPSAFE
vfs.zfs.version.module:  RD MPSAFE
vfs.zfs.version.spa:  RD MPSAFE
vfs.zfs.version.zpl:  RD MPSAFE
%
 
Interesting its been reported performance recovers after recreating the pool, a likely explanation for that is better fragmentation.
 
I vacated my 10TB tank, which was nearly 10 years old, by sending a snapshot to external media, and then sending it back. I did re-configure and re-initialise the tank while the data were away -- it got an extra spindle in the RAID-1Z set.

The scrub time came down to 5 hours. It was, to the best of my recollection, up around around 12 hours.

So fragmentation probably matters...
 
… The scrub time came down to 5 hours. It was, to the best of my recollection, up around around 12 hours.

So fragmentation probably matters…

I should not expect fragmentation of files, alone, to have so extreme an effect on scrub (of pool metadata and blocks).

From zpool-scrub.8 — OpenZFS documentation:

… A scrub is split into two parts: metadata scanning and block scrubbing. The metadata scanning sorts blocks into large sequential ranges which can then be read much more efficiently from disk when issuing the scrub I/O. …
 
Interesting its been reported performance recovers after recreating the pool, a likely explanation for that is better fragmentat
No, I had similar problems on the new pool. This is why I dug into other configuration options and came up with restricting the default arc size.
 
Ok there is another issue. Even though our machine has rather low write activities "on average", we sometime have high write activities by creating many tens of thousands rather small files.

This might create some kind of fragmentation due to the ZIL acitivities that are on the same device, as we don't have additional disks available, see https://thomas.gouverneur.name/2011/06/20110609zfs-fragmentation-issue-examining-the-zil/

This gives a hint at what happened, as a) the 14 TB partition was running low on disk space and b) very many files where created at one time, when we were very low on space. So maybe, here we had problems with massive fragmentation.
 
I should not expect fragmentation of files, alone, to have so extreme an effect on scrub (of pool metadata and blocks).
Nor would I. However, the original tank was in continuous service for 8 years. It had 6 spindles in RAID-Z1 configuration. This is sub-optimal, and 7 spindles (which is what I went to) is technically better. But the general advice is that when you turn on compression (and I did), the spindle count advantage is diminished.

In any event, I can't go back, and have recently evolved again to use a 4-way stripe of 2-spindle mirrors with a separate ZIL.
 
Back
Top