ZFS Memory usage way above arc_max

vfs.zfs.arc_max has been set to 4G from /boot/loader.conf:
Code:
vfs.zfs.arc_max="4G"
Yet memory usage is way above that and keeps growing.
Code:
ARC: 25G Total, 8461M MFU, 391M MRU, 24M Anon, 132M Header, 16G Other
     1823M Compressed, 7054M Uncompressed, 3.87:1 Ratio
Any reason for that? Some info that might be useful:
Please see how arc_meta_max is way above arc_meta_limit:
There are around 15 million files served (it's an image caching server), but other servers don't exhibit this problem, their ARC usage stays within limits.
Code:
$ freebsd-version -uk
11.3-RELEASE-p3
11.3-RELEASE-p4
Code:
$ sysctl -a|fgrep -i arc
device  arcmsr
kern.supported_archs: amd64 i386
vfs.zfs.arc_min_prescient_prefetch_ms: 6
vfs.zfs.arc_min_prefetch_ms: 1
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
[B]vfs.zfs.arc_meta_limit: 1073741824[/B]
vfs.zfs.arc_free_target: 339086
vfs.zfs.arc_kmem_cache_reap_retry_ms: 1000
vfs.zfs.compressed_arc_enabled: 1
vfs.zfs.arc_grow_retry: 60
vfs.zfs.arc_shrink_shift: 7
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_no_grow_shift: 5
vfs.zfs.arc_min: 4294967296
vfs.zfs.arc_max: 4294967296
vfs.ffs.maxclustersearch: 10
debug.adaptive_machine_arch: 1
hw.machine_arch: amd64
kstat.zfs.misc.arcstats.demand_hit_prescient_prefetch: 0
kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch: 52697
kstat.zfs.misc.arcstats.async_upgrade_sync: 212
kstat.zfs.misc.arcstats.arc_meta_min: 2147483648
[B]kstat.zfs.misc.arcstats.arc_meta_max: 27683785944
kstat.zfs.misc.arcstats.arc_meta_limit: 1073741824
kstat.zfs.misc.arcstats.arc_meta_used: 26997506960[/B]
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
kstat.zfs.misc.arcstats.l2_write_pios: 0
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
kstat.zfs.misc.arcstats.l2_write_full: 0
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 50295
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 0
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.l2_asize: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_evict_l1cached: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_writes_lock_retry: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_write_bytes: 0
kstat.zfs.misc.arcstats.l2_read_bytes: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.mfu_ghost_evictable_metadata: 284554752
kstat.zfs.misc.arcstats.mfu_ghost_evictable_data: 4057600
kstat.zfs.misc.arcstats.mfu_ghost_size: 288612352
kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0
kstat.zfs.misc.arcstats.mfu_evictable_data: 0
kstat.zfs.misc.arcstats.mfu_size: 8860957184
kstat.zfs.misc.arcstats.mru_ghost_evictable_metadata: 3858164224
kstat.zfs.misc.arcstats.mru_ghost_evictable_data: 12436992
kstat.zfs.misc.arcstats.mru_ghost_size: 3870601216
kstat.zfs.misc.arcstats.mru_evictable_metadata: 0
kstat.zfs.misc.arcstats.mru_evictable_data: 0
kstat.zfs.misc.arcstats.mru_size: 414954496
kstat.zfs.misc.arcstats.anon_evictable_metadata: 0
kstat.zfs.misc.arcstats.anon_evictable_data: 0
kstat.zfs.misc.arcstats.anon_size: 4942336
kstat.zfs.misc.arcstats.other_size: 17661001392
kstat.zfs.misc.arcstats.metadata_size: 9197769728
kstat.zfs.misc.arcstats.data_size: 83084288
kstat.zfs.misc.arcstats.hdr_size: 138735840
kstat.zfs.misc.arcstats.overhead_size: 7385242112
kstat.zfs.misc.arcstats.uncompressed_size: 7385242112
kstat.zfs.misc.arcstats.compressed_size: 1895620608
kstat.zfs.misc.arcstats.size: 27080591248
kstat.zfs.misc.arcstats.c_max: 4294967296
kstat.zfs.misc.arcstats.c_min: 4294967296
kstat.zfs.misc.arcstats.c: 4294967296
kstat.zfs.misc.arcstats.p: 3648431616
kstat.zfs.misc.arcstats.hash_chain_max: 3
kstat.zfs.misc.arcstats.hash_chains: 3643
kstat.zfs.misc.arcstats.hash_collisions: 579973
kstat.zfs.misc.arcstats.hash_elements_max: 519990
kstat.zfs.misc.arcstats.hash_elements: 493266
kstat.zfs.misc.arcstats.evict_l2_skip: 0
kstat.zfs.misc.arcstats.evict_l2_ineligible: 858988544
kstat.zfs.misc.arcstats.evict_l2_eligible: 1462018226176
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_not_enough: 21387547
kstat.zfs.misc.arcstats.evict_skip: 1215126917
kstat.zfs.misc.arcstats.access_skip: 729118541
kstat.zfs.misc.arcstats.mutex_miss: 2682824
kstat.zfs.misc.arcstats.deleted: 17533423
kstat.zfs.misc.arcstats.allocated: 312640665
kstat.zfs.misc.arcstats.mfu_ghost_hits: 1102265
kstat.zfs.misc.arcstats.mfu_hits: 1043470003
kstat.zfs.misc.arcstats.mru_ghost_hits: 11899899
kstat.zfs.misc.arcstats.mru_hits: 43401992
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 558724
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 233008
kstat.zfs.misc.arcstats.prefetch_data_misses: 11803
kstat.zfs.misc.arcstats.prefetch_data_hits: 12
kstat.zfs.misc.arcstats.demand_metadata_misses: 31499035
kstat.zfs.misc.arcstats.demand_metadata_hits: 1084166851
kstat.zfs.misc.arcstats.demand_data_misses: 865029
kstat.zfs.misc.arcstats.demand_data_hits: 2581257
kstat.zfs.misc.arcstats.misses: 32934591
kstat.zfs.misc.arcstats.hits: 1086981128
 
Whoever did this, thanks for properly formatting my post, unfortunately marking some parts of text in bold inside code blocks isn't supported.
 
I changed arc_max to 32GB on the live system using sysctl hoping that the uncontrolled growth of total ARC usage (seen in top(1)) would stop, and it did stop, after reaching 32G yesterday. So far so good. kstat.zfs.misc.arcstats.arc_meta_used still keeps growing, though and it is 30654257520 already.
 
Any reason for that?

Yes. This has often been brought up, but not often sufficiently explained. So let's try:

First off: the sysctl values are of two different kinds. Some of them are metrics, displaying the actual situation (usually at this point in time), others are advisory values. The latter are either set to an appropriate value at system startup, or dynamically adjusted (that's the A for adaptive in ARC), or configured by the operator.
For instance the ARC size: c_max (aka arc_max) is the advisory value set at startup or configured by the operator. c is the dynamically adjusted advisory value (which gets placed somewhere between c_min and c_max as seems appropriate at the time), and size is the metric - that's the actual size of the ARC at this time.

Then, whenever anything is read from disk, it must go into the ARC, because there is nowhere else where it could go. Therefore the ARC must grow - because the only other option would be to stop disk reads.
This means: the ARC does always grow, it cannot shrink by itself.

In order to shrink the ARC, there is a separate job that runs asynchronously. That job looks for old data in the ARC that can be evicted. And only here do the advisory values come into play: they tell this evict job when to run and how much to evict. And that's how the size gets adjusted.

Now there are two special cases to consider:
  1. The evict job runs slower than the disk reads do proceed.
  2. The evict job completes, but does not find anything to evict.
In the first case, when the ARC gets above the advisory size c, it will wait for the evict job to complete a round, before accomodating new data. In the second case, waiting does not help, and therefore nothing is done, and the ARC continues to grow.

This is explained in the code (arc.c):
Code:
        /*
         * If arc_size is currently overflowing, and has grown past our
         * upper limit, we must be adding data faster than the evict
         * thread can evict. Thus, to ensure we don't compound the
         * problem by adding more data and forcing arc_size to grow even
         * further past it's target size, we halt and wait for the
         * eviction thread to catch up.
         *
         * It's also possible that the reclaim thread is unable to evict
         * enough buffers to get arc_size below the overflow limit (e.g.
         * due to buffers being un-evictable, or hash lock collisions).
         * In this case, we want to proceed regardless if we're
         * overflowing; thus we don't use a while loop here.
         */

We can check if there is evictable data. It is shown in these metrics (from Your output above):
Code:
kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0
kstat.zfs.misc.arcstats.mfu_evictable_data: 0
kstat.zfs.misc.arcstats.mru_evictable_metadata: 0
kstat.zfs.misc.arcstats.mru_evictable_data: 0
kstat.zfs.misc.arcstats.anon_evictable_metadata: 0
kstat.zfs.misc.arcstats.anon_evictable_data: 0

Now we already reached the "works as designed" point. ;)

But what is happening here? You already pointed at the arc_meta values, and they are indeed the issue.
There are three values: arc_meta_used is the metric for the current moment. arc_meta_max I suppose keeps a record of the highest value reached during this uptime.
And arc_meta_limit is the advisory value, by default set to 25% of arc_max, but can be adjusted by operator. The only purpose that this value serves is to decide if the evict currently should adress more data or more metadata. In any other regard the ARC will just read in as much metadata as is requested from the disk.
So, arc_meta_limit does by no means "limit" the metadata, it just declares when we have enough.

Now, looking at Your values, we see:
Code:
kstat.zfs.misc.arcstats.c_max: 4294967296
kstat.zfs.misc.arcstats.c_min: 4294967296
kstat.zfs.misc.arcstats.c: 4294967296
kstat.zfs.misc.arcstats.size: 27080591248
kstat.zfs.misc.arcstats.arc_meta_used: 26997506960
  1. You appear to have set c_max and c_min to the same value, so the internal logic does not need to come up with an advisory value for c, as it is all the same.
  2. The actual ARC size is way above c.
  3. 100% of this is metadata.
  4. none of this is evictable.
The rule for the evict seems to be: evictable buffers must not be "referenced", which means, they aren't used by somebody. In the case of metadata, it being in use is internal business of ZFS. And it appears that ZFS keeps enormous amounts of metadata in use.

I don't know why this happens, but I am almost certain that this is directly related to having more than just a few files on the system, and reading their directories.

This is very different to classic unix behaviour, where the amount of files would basically slow things down, but otherwise not be limited. Here, the ARC defintely can outgrow arc_max, it definitely can occupy all available memory, and it definitely can crash the system per OOM, if only given enough files.

I'm quite disappointed about that, and it seems it is not clearly stated anywhere (at least I didn't find it): a couple millions of files was already normal when Usenet was at it's best, on machines that had a couple megabytes memory.
 
Thanks for the great explanation. I should note that this wasn't the case for FreeBSD 10.3 having the same number of files (around 30 million inodes) - it could happily stay within just 5GB ARC and the machine had just 16GB RAM total, while this one has around 190 (and FreeBSD 11.3)

real memory = 206158430208 (196608 MB)
avail memory = 199839551488 (190581 MB)

My guess is that the system sees that there's enough "free" RAM currently - 112GB as per top, and decides not to free anything yet.
 
You appear to have set c_max and c_min to the same value, so the internal logic does not need to come up with an advisory value for c, as it is all the same.

I did not. I only set vfs.zfs.arc_max="4G" in /boot/loader.conf. But it appeared to be too low for the calculations with respect to the total amount of available memory (190GB), so arc_min was set by the system to that value too, and not to one half of arc_max, as it normally would.

After noticing that I set arc_max to 32GB to somehow put a brake on the uncontrolled growth. ARC total usage (in top) is 34GB currently, 112GB Free RAM. arc_min stayed at 4GB.
 
It seems my guess was correct. On another machine (also 11.3) which has 16GB RAM and serves almost 12 million files ARC fits entirely in 4GB size (actually 4093 mb in top). There's only 983 mb memory Free.

See how meta_used is less than meta_max - which was probably reached at some point and then freed:

kstat.zfs.misc.arcstats.arc_meta_min: 268435456
kstat.zfs.misc.arcstats.arc_meta_max: 6384094656
kstat.zfs.misc.arcstats.arc_meta_limit: 1073741824
kstat.zfs.misc.arcstats.arc_meta_used: 3846783736
 
And on the first machine which has 34GB ARC over 32GB limit:
kstat.zfs.misc.arcstats.mfu_ghost_evictable_metadata: 1187154944
kstat.zfs.misc.arcstats.mru_ghost_evictable_metadata: 33025672704

So maybe freeing it is just a matter of time. It hasn't leaked or anything.
 
Thanks for the great explanation. I should note that this wasn't the case for FreeBSD 10.3 having the same number of files (around 30 million inodes) - it could happily stay within just 5GB ARC and the machine had just 16GB RAM total, while this one has around 190 (and FreeBSD 11.3)

real memory = 206158430208 (196608 MB)
avail memory = 199839551488 (190581 MB)

My guess is that the system sees that there's enough "free" RAM currently - 112GB as per top, and decides not to free anything yet.

No, that's certainly not the case. It does OOM kills - thats actually how I found out about it: I decided to put a couple of additional jails onto my really small household router, and for installation I just copied the /usr/src+obj+ports trees into each jail. These trees were not marked as "nosuid", and in the night the periodic security came along and startet individual find jobs on each of them - and then the OOM killer came and killed the biggest process, which happened to be the named. :(
Looking at that -and not thinking it all too funny-I recognized that the system had been a bit mistreated with all these find jobs, but nevertheless thats no reason to do OOM kills. And so I investigated further.

I'm indeed wondering why I didn't recognize this behaviour ever before, as I was running ZFS on constrained environments for a long time already. It may actually be something that appeared somewhere with Rel 11.X, alongside with whatever new features.

So maybe freeing it is just a matter of time. It hasn't leaked or anything.

No, it doesn't leak - when you stop reading the directories, the memory gets freed again.

Anyway, for now I concluded to always mark my ports trees as "nosuid", and in case I need a real lot of small files I'll put them on an UFS filesystem (I always recommended to not go the ZFS-for-root way and keep both options open).
But then, if this was indeed not present in 10.X, then I would consider it a bug.
 
No, it doesn't leak - when you stop reading the directories, the memory gets freed again.
Unlike the one above, this machine has 16GB RAM. Also running FreeBSD 11.3. One would expect it to want at least 12-14GB ARC, but here is its current memory use breakdown:

Code:
Mem: 2131M Active, 2157M Inact, 10G Wired, 779M Free
ARC: 4092M Total, 2059M MFU, 1104M MRU, 8528K Anon, 61M Header, 860M Other
     910M Compressed, 2632M Uncompressed, 2.89:1 Ratio
Swap: 16G Total, 16G Free

handling around 12 million inodes. The machine does the same thing the one I opened this thread for does: caching images with nginx.

So it seems that total amount of installed RAM and/or free RAM has some significance.

Code:
$ sysctl -a|fgrep zfs
1 PART da1p4 471268851712 512 i 4 o 8800698368 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b
1 PART da0p4 471268851712 512 i 4 o 8800698368 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b
z0xfffff8005f1e2800 [shape=box,label="ZFS::VDEV\nzfs::vdev\nr#3"];
            <type>freebsd-zfs</type>
            <label>zfs1</label>
            <type>freebsd-zfs</type>
            <label>zfs0</label>
      <name>zfs::vdev</name>
vfs.zfs.trim.max_interval: 1
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.txg_delay: 32
vfs.zfs.trim.enabled: 1
vfs.zfs.vol.immediate_write_sz: 32768
vfs.zfs.vol.unmap_sync_enabled: 0
vfs.zfs.vol.unmap_enabled: 1
vfs.zfs.vol.recursive: 0
vfs.zfs.vol.mode: 1
vfs.zfs.version.zpl: 5
vfs.zfs.version.spa: 5000
vfs.zfs.version.acl: 1
vfs.zfs.version.ioctl: 7
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
vfs.zfs.immediate_write_sz: 32768
vfs.zfs.sync_pass_rewrite: 2
vfs.zfs.sync_pass_dont_compress: 5
vfs.zfs.sync_pass_deferred_free: 2
vfs.zfs.zio.dva_throttle_enabled: 1
vfs.zfs.zio.exclude_metadata: 0
vfs.zfs.zio.use_uma: 1
vfs.zfs.zil_slog_bulk: 786432
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_replay_disable: 0
vfs.zfs.standard_sm_blksz: 131072
vfs.zfs.dtl_sm_blksz: 4096
vfs.zfs.min_auto_ashift: 12
vfs.zfs.max_auto_ashift: 13
vfs.zfs.vdev.trim_max_pending: 10000
vfs.zfs.vdev.bio_delete_disable: 0
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.vdev.def_queue_depth: 32
vfs.zfs.vdev.queue_depth_pct: 1000
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.aggregation_limit_non_rotating: 131072
vfs.zfs.vdev.aggregation_limit: 1048576
vfs.zfs.vdev.initializing_max_active: 1
vfs.zfs.vdev.initializing_min_active: 1
vfs.zfs.vdev.removal_max_active: 2
vfs.zfs.vdev.removal_min_active: 1
vfs.zfs.vdev.trim_max_active: 64
vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.scrub_max_active: 2
vfs.zfs.vdev.scrub_min_active: 1
vfs.zfs.vdev.async_write_max_active: 10
vfs.zfs.vdev.async_write_min_active: 1
vfs.zfs.vdev.async_read_max_active: 3
vfs.zfs.vdev.async_read_min_active: 1
vfs.zfs.vdev.sync_write_max_active: 10
vfs.zfs.vdev.sync_write_min_active: 10
vfs.zfs.vdev.sync_read_max_active: 10
vfs.zfs.vdev.sync_read_min_active: 10
vfs.zfs.vdev.max_active: 1000
vfs.zfs.vdev.async_write_active_max_dirty_percent: 60
vfs.zfs.vdev.async_write_active_min_dirty_percent: 30
vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1
vfs.zfs.vdev.mirror.non_rotating_inc: 0
vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576
vfs.zfs.vdev.mirror.rotating_seek_inc: 5
vfs.zfs.vdev.mirror.rotating_inc: 0
vfs.zfs.vdev.trim_on_init: 1
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 0
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.validate_skip: 0
vfs.zfs.vdev.max_ms_shift: 38
vfs.zfs.vdev.default_ms_shift: 29
vfs.zfs.vdev.max_ms_count_limit: 131072
vfs.zfs.vdev.min_ms_count: 16
vfs.zfs.vdev.max_ms_count: 200
vfs.zfs.txg.timeout: 5
vfs.zfs.space_map_ibs: 14
vfs.zfs.spa_allocators: 4
vfs.zfs.spa_min_slop: 134217728
vfs.zfs.spa_slop_shift: 5
vfs.zfs.spa_asize_inflation: 24
vfs.zfs.deadman_enabled: 1
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_synctime_ms: 1000000
vfs.zfs.debug_flags: 0
vfs.zfs.debugflags: 0
vfs.zfs.recover: 0
vfs.zfs.spa_load_verify_data: 1
vfs.zfs.spa_load_verify_metadata: 1
vfs.zfs.spa_load_verify_maxinflight: 10000
vfs.zfs.max_missing_tvds_scan: 0
vfs.zfs.max_missing_tvds_cachefile: 2
vfs.zfs.max_missing_tvds: 0
vfs.zfs.spa_load_print_vdev_tree: 0
vfs.zfs.ccw_retry_interval: 300
vfs.zfs.check_hostid: 1
vfs.zfs.mg_fragmentation_threshold: 85
vfs.zfs.mg_noalloc_threshold: 0
vfs.zfs.condense_pct: 200
vfs.zfs.metaslab_sm_blksz: 4096
vfs.zfs.metaslab.bias_enabled: 1
vfs.zfs.metaslab.lba_weighting_enabled: 1
vfs.zfs.metaslab.fragmentation_factor_enabled: 1
vfs.zfs.metaslab.preload_enabled: 1
vfs.zfs.metaslab.preload_limit: 3
vfs.zfs.metaslab.unload_delay: 8
vfs.zfs.metaslab.load_pct: 50
vfs.zfs.metaslab.min_alloc_size: 33554432
vfs.zfs.metaslab.df_free_pct: 4
vfs.zfs.metaslab.df_alloc_threshold: 131072
vfs.zfs.metaslab.debug_unload: 0
vfs.zfs.metaslab.debug_load: 0
vfs.zfs.metaslab.fragmentation_threshold: 70
vfs.zfs.metaslab.force_ganging: 16777217
vfs.zfs.free_bpobj_enabled: 1
vfs.zfs.free_max_blocks: 18446744073709551615
vfs.zfs.zfs_scan_checkpoint_interval: 7200
vfs.zfs.zfs_scan_legacy: 0
vfs.zfs.no_scrub_prefetch: 0
vfs.zfs.no_scrub_io: 0
vfs.zfs.resilver_min_time_ms: 3000
vfs.zfs.free_min_time_ms: 1000
vfs.zfs.scan_min_time_ms: 1000
vfs.zfs.scan_idle: 50
vfs.zfs.scrub_delay: 4
vfs.zfs.resilver_delay: 2
vfs.zfs.top_maxinflight: 32
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.max_idistance: 67108864
vfs.zfs.zfetch.max_distance: 8388608
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 0
vfs.zfs.delay_scale: 500000
vfs.zfs.delay_min_dirty_percent: 60
vfs.zfs.dirty_data_sync: 67108864
vfs.zfs.dirty_data_max_percent: 10
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max: 1686953164
vfs.zfs.max_recordsize: 1048576
vfs.zfs.default_ibs: 17
vfs.zfs.default_bs: 9
vfs.zfs.send_holes_without_birth_time: 1
vfs.zfs.mdcomp_disable: 0
vfs.zfs.per_txg_dirty_frees_percent: 30
vfs.zfs.nopwrite_enabled: 1
vfs.zfs.dedup.prefetch: 1
vfs.zfs.dbuf_cache_lowater_pct: 10
vfs.zfs.dbuf_cache_hiwater_pct: 10
vfs.zfs.dbuf_metadata_cache_overflow: 0
vfs.zfs.dbuf_metadata_cache_shift: 6
vfs.zfs.dbuf_cache_shift: 5
vfs.zfs.dbuf_metadata_cache_max_bytes: 67108864
vfs.zfs.dbuf_cache_max_bytes: 134217728
vfs.zfs.arc_min_prescient_prefetch_ms: 6
vfs.zfs.arc_min_prefetch_ms: 1
vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_esize: 0
vfs.zfs.mfu_ghost_metadata_esize: 1128455168
vfs.zfs.mfu_ghost_size: 1128455168
vfs.zfs.mfu_data_esize: 264017408
vfs.zfs.mfu_metadata_esize: 1343488
vfs.zfs.mfu_size: 2179298816
vfs.zfs.mru_ghost_data_esize: 19394048
vfs.zfs.mru_ghost_metadata_esize: 3145071616
vfs.zfs.mru_ghost_size: 3164465664
vfs.zfs.mru_data_esize: 6268416
vfs.zfs.mru_metadata_esize: 471040
vfs.zfs.mru_size: 1125251584
vfs.zfs.anon_data_esize: 0
vfs.zfs.anon_metadata_esize: 0
vfs.zfs.anon_size: 10964992
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.arc_meta_limit: 1073741824
vfs.zfs.arc_free_target: 27797
vfs.zfs.arc_kmem_cache_reap_retry_ms: 1000
vfs.zfs.compressed_arc_enabled: 1
vfs.zfs.arc_grow_retry: 60
vfs.zfs.arc_shrink_shift: 7
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_no_grow_shift: 5
vfs.zfs.arc_min: 536870912
vfs.zfs.arc_max: 4294967296
vfs.zfs.abd_chunk_size: 4096
vfs.zfs.abd_scatter_enabled: 1
kstat.zfs.misc.vdev_cache_stats.misses: 0
kstat.zfs.misc.vdev_cache_stats.hits: 0
kstat.zfs.misc.vdev_cache_stats.delegations: 0
kstat.zfs.misc.arcstats.demand_hit_prescient_prefetch: 13909556
kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch: 3620808
kstat.zfs.misc.arcstats.async_upgrade_sync: 43122
kstat.zfs.misc.arcstats.arc_meta_min: 268435456
kstat.zfs.misc.arcstats.arc_meta_max: 6384094656
kstat.zfs.misc.arcstats.arc_meta_limit: 1073741824
kstat.zfs.misc.arcstats.arc_meta_used: 3885823872
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
kstat.zfs.misc.arcstats.l2_write_pios: 0
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
kstat.zfs.misc.arcstats.l2_write_full: 0
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 88298038
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 0
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.l2_asize: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_evict_l1cached: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_writes_lock_retry: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_write_bytes: 0
kstat.zfs.misc.arcstats.l2_read_bytes: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.mfu_ghost_evictable_metadata: 1129916416
kstat.zfs.misc.arcstats.mfu_ghost_evictable_data: 0
kstat.zfs.misc.arcstats.mfu_ghost_size: 1129916416
kstat.zfs.misc.arcstats.mfu_evictable_metadata: 4096
kstat.zfs.misc.arcstats.mfu_evictable_data: 262264832
kstat.zfs.misc.arcstats.mfu_size: 2185483776
kstat.zfs.misc.arcstats.mru_ghost_evictable_metadata: 3145487360
kstat.zfs.misc.arcstats.mru_ghost_evictable_data: 19462656
kstat.zfs.misc.arcstats.mru_ghost_size: 3164950016
kstat.zfs.misc.arcstats.mru_evictable_metadata: 106496
kstat.zfs.misc.arcstats.mru_evictable_data: 106496
kstat.zfs.misc.arcstats.mru_size: 1129751040
kstat.zfs.misc.arcstats.anon_evictable_metadata: 0
kstat.zfs.misc.arcstats.anon_evictable_data: 0
kstat.zfs.misc.arcstats.anon_size: 10924032
kstat.zfs.misc.arcstats.other_size: 903666848
kstat.zfs.misc.arcstats.metadata_size: 2917170176
kstat.zfs.misc.arcstats.data_size: 408988672
kstat.zfs.misc.arcstats.hdr_size: 64986848
kstat.zfs.misc.arcstats.overhead_size: 2406711296
kstat.zfs.misc.arcstats.uncompressed_size: 2760562176
kstat.zfs.misc.arcstats.compressed_size: 918687744
kstat.zfs.misc.arcstats.size: 4294812544
kstat.zfs.misc.arcstats.c_max: 4294967296
kstat.zfs.misc.arcstats.c_min: 536870912
kstat.zfs.misc.arcstats.c: 4294967296
kstat.zfs.misc.arcstats.p: 2901519230
kstat.zfs.misc.arcstats.hash_chain_max: 6
kstat.zfs.misc.arcstats.hash_chains: 13830
kstat.zfs.misc.arcstats.hash_collisions: 1284123738
kstat.zfs.misc.arcstats.hash_elements_max: 471127
kstat.zfs.misc.arcstats.hash_elements: 252013
kstat.zfs.misc.arcstats.evict_l2_skip: 0
kstat.zfs.misc.arcstats.evict_l2_ineligible: 3549507526656
kstat.zfs.misc.arcstats.evict_l2_eligible: 707209737388544
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_not_enough: 4105461814
kstat.zfs.misc.arcstats.evict_skip: 62483096249
kstat.zfs.misc.arcstats.access_skip: 3017943935
kstat.zfs.misc.arcstats.mutex_miss: 2782691933
kstat.zfs.misc.arcstats.deleted: 10015634622
kstat.zfs.misc.arcstats.allocated: 32983948752
kstat.zfs.misc.arcstats.mfu_ghost_hits: 2613777302
kstat.zfs.misc.arcstats.mfu_hits: 49062084902
kstat.zfs.misc.arcstats.mru_ghost_hits: 1589821664
kstat.zfs.misc.arcstats.mru_hits: 12349896233
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 667593114
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 18388717
kstat.zfs.misc.arcstats.prefetch_data_misses: 3890614
kstat.zfs.misc.arcstats.prefetch_data_hits: 828570
kstat.zfs.misc.arcstats.demand_metadata_misses: 5589732197
kstat.zfs.misc.arcstats.demand_metadata_hits: 43664508920
kstat.zfs.misc.arcstats.demand_data_misses: 7917786625
kstat.zfs.misc.arcstats.demand_data_hits: 17741591719
kstat.zfs.misc.arcstats.misses: 14179002550
kstat.zfs.misc.arcstats.hits: 61425317927
kstat.zfs.misc.zcompstats.skipped_insufficient_gain: 65817741
kstat.zfs.misc.zcompstats.empty: 1594790
kstat.zfs.misc.zcompstats.attempts: 1085339257
kstat.zfs.misc.zfetchstats.max_streams: 10448706923
kstat.zfs.misc.zfetchstats.misses: 10642640929
kstat.zfs.misc.zfetchstats.hits: 88375481
kstat.zfs.misc.xuio_stats.write_buf_nocopy: 71849
kstat.zfs.misc.xuio_stats.write_buf_copied: 0
kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0
kstat.zfs.misc.xuio_stats.read_buf_copied: 0
kstat.zfs.misc.xuio_stats.onloan_write_buf: 0
kstat.zfs.misc.xuio_stats.onloan_read_buf: 0
kstat.zfs.misc.abdstats.linear_data_size: 0
kstat.zfs.misc.abdstats.linear_cnt: 0
kstat.zfs.misc.abdstats.scatter_chunk_waste: 9482752
kstat.zfs.misc.abdstats.scatter_data_size: 918715904
kstat.zfs.misc.abdstats.scatter_cnt: 138546
kstat.zfs.misc.abdstats.struct_size: 6246360
kstat.zfs.misc.zio_trim.failed: 0
kstat.zfs.misc.zio_trim.unsupported: 17513
kstat.zfs.misc.zio_trim.success: 0
kstat.zfs.misc.zio_trim.bytes: 0
kstat.zfs.misc.metaslab_trace_stats.metaslab_trace_over_limit: 0
security.jail.param.allow.mount.zfs: 0
security.jail.mount_zfs_allowed: 0
 
I decided to hunt this one down a bit further.

There are 2GB physmem, and the arc is adjusted to arc_min=200MB, arc_max for testing reduced to 400MB.
Code:
ARC Size:                               55.51%  222.04  MiB
        Target Size: (Adaptive)         100.00% 400.00  MiB
        Min Size (Hard Limit):          50.00%  200.00  MiB
        Max Size (High Water):          2:1     400.00  MiB

Now I do what periodic/security/100.chksetuid would do:
# for i in 1 3 4 5 6 7 8 9 10; do jexec $i find -sx /usr/ports -type f \( -perm -u+x -or -perm -g+x -or -perm -o+x \) \( -perm -u+s -or -perm -g+s \) -exec ls -liTd \{\} \+ > /dev/null & done
And soon after things look like this:
Code:
ARC Size:                               262.24% 1.02    GiB
        Target Size: (Adaptive)         50.00%  200.00  MiB
        Min Size (Hard Limit):          50.00%  200.00  MiB
        Max Size (High Water):          2:1     400.00  MiB

kstat.zfs.misc.arcstats.mfu_evictable_metadata: 147456
kstat.zfs.misc.arcstats.mfu_evictable_data: 0
kstat.zfs.misc.arcstats.mru_evictable_metadata: 36864
kstat.zfs.misc.arcstats.mru_evictable_data: 0
kstat.zfs.misc.arcstats.anon_evictable_metadata: 0
kstat.zfs.misc.arcstats.anon_evictable_data: 0

I kill the find jobs at this point, but, surprizingly, only part of the memory gets free, arc is still over limit, and nothing to evict:
Code:
ARC Size:                               192.43% 769.73  MiB
        Target Size: (Adaptive)         50.00%  200.00  MiB
        Min Size (Hard Limit):          50.00%  200.00  MiB
        Max Size (High Water):          2:1     400.00  MiB

kstat.zfs.misc.arcstats.mfu_evictable_metadata: 0
kstat.zfs.misc.arcstats.mfu_evictable_data: 0
kstat.zfs.misc.arcstats.mru_evictable_metadata: 0
kstat.zfs.misc.arcstats.mru_evictable_data: 0
kstat.zfs.misc.arcstats.anon_evictable_metadata: 0
kstat.zfs.misc.arcstats.anon_evictable_data: 0

With the system inactive, this does not change over time. it does, however, disappear when unmounting the respective filesystems! This metadata, which is not evictable, and therefore still referenced by something (but there is no process running anymore that might reference it) must be somehow associated with the mount.

Thinking about this it came to my mind that unix typically has an inode cache in the kernel. So I started searching for that, and I came across this thread, and the linked bugreports pointing to kern.maxvnodes.

In my case, the default kern.maxvnodes appears to be too high, and reducing it by 50% solves my problem. :)
Cross-checking on an amd64 with reduced arc_max shows that while this is influential, it is not the only factor: here the arc stays above 1GB even with very low vfs.numvnodes - but I leave that one for investigation to the reader. ;)

Concerning the impact: the handbook says kern.maxvnodes might be increased in order to reduce disk i/o. With ZFS this should not be fully true, because we have the arc in between the vnode cache and the disk, and we might increase vfs.zfs.arc_meta_limit to keep more metadata in the arc (as long as there is space available). Keeping the data in the innermost cache is certainly the best for performance, but then it is also kept in the arc, and that has to be increased accordingly. OTOH, keeping it only in the arc allows it to adjust to memory pressure.

So it seems that total amount of installed RAM and/or free RAM has some significance.

Afaik the value for kern.maxvnodes is adjusted at boot time according to kern.maxusers and/or the installed RAM, so it may indeed be dependent on that.
 
Back
Top