Freebsd 11.1 ZFS using to much wired memory

Certainly not. The arc_max and arc_min are not hard limits, they are rather recommentations to the evict process. E.g. if you have more inodes opened than do fit into arc_max, and they cannot be evicted (because they are opened), then the arc will not even shrink down to the
new arc_max.
And anyway, the wired memory will not immediately shrink. It will do its own optimisation stuff and occasionally shrink when there is other demand for ram,
 
This is why:

Code:
last pid: 36059;  load averages:  0.54,  0.55,  0.54                                                                                                                                                                 up 16+10:56:31  09:26:45
49 processes:  1 running, 48 sleeping
CPU:  0.0% user,  0.0% nice,  0.6% system,  0.0% interrupt, 99.4% idle
Mem: 3633M Active, 8391M Inact, 7538M Laundry, 240G Wired, 104K Buf, 14G Free
ARC: 191G Total, 144G MFU, 46G MRU, 7148K Anon, 784M Header, 390M Other
     184G Compressed, 208G Uncompressed, 1.13:1 Ratio
Swap: 16G Total, 2302M Used, 14G Free, 14% Inuse


It appears that arc_max mistakenly wasn't set in /boot/loader.conf and was at ~380GB (ARC shown in top output had reached roughly 220G).
Yesterday I set vfs.zfs.arc_max to 192GB to put an end to this. c_max and kstat.zfs.misc.arcstats.size were automatically adjusted based on that value.
But I'm afraid this extra wired memory won't be freed. On the contrary it grew by an additional 3gb and (free mem decreased by 3GB) overnight.
Also swap use considering so much essentially free mem and such low use of Active mem doesn't look all too promising. About 7gb are waiting to be laundered.
 
This is why:

Code:
last pid: 36059;  load averages:  0.54,  0.55,  0.54                                                                                                                                                                 up 16+10:56:31  09:26:45
49 processes:  1 running, 48 sleeping
CPU:  0.0% user,  0.0% nice,  0.6% system,  0.0% interrupt, 99.4% idle
Mem: 3633M Active, 8391M Inact, 7538M Laundry, 240G Wired, 104K Buf, 14G Free
ARC: 191G Total, 144G MFU, 46G MRU, 7148K Anon, 784M Header, 390M Other
     184G Compressed, 208G Uncompressed, 1.13:1 Ratio
Swap: 16G Total, 2302M Used, 14G Free, 14% Inuse


It appears that arc_max mistakenly wasn't set in /boot/loader.conf and was at ~380GB (ARC shown in top output had reached roughly 220G).
Yesterday I set vfs.zfs.arc_max to 192GB to put an end to this. c_max and kstat.zfs.misc.arcstats.size were automatically adjusted based on that value.
But I'm afraid this extra wired memory won't be freed. On the contrary it grew by an additional 3gb and (free mem decreased by 3GB) overnight.
Also swap use considering so much essentially free mem and such low use of Active mem doesn't look all too promising. About 7gb are waiting to be laundered.
So what? Free memory is not useful, so the kernel tries to put it to some use.
And occupied unaccessed pages are meant to be moved to swap. Get the thing enough swap so the swap is in a sane relationship to the installed memory.
 
Setting that is not a dynamic event. You need to reboot for it to take effect.
I agree with PMc but sometimes, it's less stress to "just do something".
ZFS by default will try to use almost all of RAM. It leaves a little bit for the kernel.
ZFS is also designed to release memory if there is pressure, the downside is the release is not immediate.

On a desktop, setting ZFS to not use all the RAM (say set to use half) has a slight positive impact: you will always have RAM for Firefox to use :)
 
Alas, rebooting isn't an option. What if I write a small program that calls calloc in small chunks enough times to cause some memory pressure, so that wired memory gets released to somewhere below the new arc_size. Then the program would exit, and voila - less wired mem, more free mem.
 
Alas, rebooting isn't an option. What if I write a small program that calls calloc in small chunks enough times to cause some memory pressure, so that wired memory gets released to somewhere below the new arc_size. Then the program would exit, and voila - less wired mem, more free mem.
In theory I guess that would work. Worst case the system crashes, reboots and it uses the new value. I'm assuming that this a a server of some kind? Can you do a hot failover to a backup system? If so, you could fail over, stop the service(s) on the one you are concerned about, restart the services, then fail back.

If you have installed zfs-stats you can take a look at things related to ARC.
zfs-stats -A and zfs-stats -E are the two most useful.

MFU and MRU are the two big pieces of ARC. MRU (Most Recently Used) is the first layer (like traditional Buffer Cache), MFU (Most Frequently Used) is the next layer. Things age out of MRU and wind up on MFU, if they get used again, they can wind up on MRU again. If you look, MFU + MRU is pretty much your ARC size. Look what zfs-stats -E says. If you have high cache hit numbers, then yes, you have a lot of memory in use by ARC but almost all requests are being served from cache. That means you are not going the physical devices very much, which is pretty much always a good thing.
 
Alas, rebooting isn't an option. What if I write a small program that calls calloc in small chunks
No need for that. You can use awk to fill some array with strings. It's a one-liner.
enough times to cause some memory pressure, so that wired memory gets released to somewhere below the new arc_size. Then the program would exit, and voila - less wired mem, more free mem.
It won't go below. The whole arc is in wired mem, plus most other things in the kernel, plus some user applications that use wired mem, plus bhyve (depending on the configuration).
 
Bingo. This simple C program did the trick

Code:
#include <unistd.h>
#include <stdlib.h>
#include <strings.h>

int main(void) {
  while (1) {
    void *p = malloc(20*1024*1024);
    explicit_bzero(p, 20*1024*1024);
    sleep(1);
  }

  return 0;
}

After about 1.5gb of watching active mem+laundry queue grow in top, and free mem decreasing, I got bored and hit Ctrl+C in the program's window. And suddenly magic happened:

Code:
last pid: 43833;  load averages:  0.03,  0.29,  0.44                                                                                                                                                                 up 16+19:51:48  18:22:02
52 processes:  1 running, 51 sleeping
CPU:  0.1% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.7% idle
Mem: 3692M Active, 8417M Inact, 7539M Laundry, 202G Wired, 104K Buf, 52G Free
ARC: 190G Total, 143G MFU, 47G MRU, 12M Anon, 790M Header, 412M Other
     184G Compressed, 208G Uncompressed, 1.13:1 Ratio
Swap: 16G Total, 2302M Used, 14G Free, 14% Inuse

Much better.
 
Comparing against the stats you put up in #28, the major change I see is "Free" went from 14G to 52G, corresponding "Wired" from 240G to 202G.
ARC is basically unchanged, Swap unchanged. That implies to me that "it wasn't ZFS".

Hitting ctrl-c caused everything allocated in your program to be freed. Perhaps it triggered a forced colescing of buffers or what ever was in Wired.

Before doing this, was the Wired relatively static?
 
Yup, wired simply grew from 237 to 240G overnight (and correspondingly Free down from 17 to 14gb) and remained there.

I think it was the extra ZFS wired memory back from when ARC was at about 220G.
Then I set arc_max to 192G dynamically and top (along with relevant sysctls) reflected the change immediately. But wired remained at 240G.
 
  • Like
Reactions: mer
This time I set arc_max to 16gb (interestingly it can't be set to 8gb, I mean arc_max is set to that value, but top's ARC remains at 16GB), and ran the program above allocating 500mb every second. After around 15-20gb, still running, wired started dropping rapidly in top with free mem increasing. Landry didn't change at all. Wired stopped dropping at 23G mark.

Code:
last pid: 44926;  load averages:  0.55,  0.19,  0.06                                                                                                                                                                 up 16+21:09:04  19:39:18
52 processes:  1 running, 51 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 3710M Active, 8417M Inact, 7525M Laundry, 23G Wired, 104K Buf, 230G Free
ARC: 16G Total, 8730M MFU, 7017M MRU, 8536K Anon, 79M Header, 399M Other
     15G Compressed, 20G Uncompressed, 1.37:1 Ratio
Swap: 16G Total, 2302M Used, 14G Free, 14% Inuse

Nice.
 
I have nothing more to say other than "Hmm. Interesting". Without knowing what exactly the Wired was tied to, other than it seems to be related to ARC, just it's interesting.
I took a quick look back and couldn't find it, but What Version of FreeBSD are you running?
 
I have nothing more to say other than "Hmm. Interesting". Without knowing what exactly the Wired was tied to, other than it seems to be related to ARC, just it's interesting.
I would like to get a usage breakdown for wired. (I think a lot of people would.) But it appears very difficult to obtain, due to the fragmentation issue:

Operative memory is endangered by fragmentation; blocks of various sizes are allocated and released all the time, so after some uptime you may still have free memory, but it will be dispersed into little chunks which cannot be used contiguously.
This is a big problem when systems should stay up for a longer time, and it is not easy to solve - you cannot just move memory around to de-fragment (pointer arithmetics!)
Therefore algorithms have been implemented which pre-allocate chunks of equal size for similar use-cases, trying to anticipate the expected duration of use of these chunks - so that when the chunks are freed, they can be re-joined to form bigger free blocks. (You can read about that, it's called UMA.)

Bottomline: that wired memory may not have been in use at all, the kernel might just have anticipated that it could be demanded in the future. And so it would be unwise to free it without actual need.
 
The amount of wired memory correlated with the amount of ARC in use (10-50gb more than that). This is expected as ARC is always wired by definition.
I just didn't want it to grow forever (up to ~380gb limit, which is total ram), as this unnecessary disk cache resulted in needlessly swapping out non-wired mem.

Here's current ARC breakdown shown by zfs-stats -A:
Code:
ARC Efficiency
        Cache Access Total:                     267629336
        Cache Hit Ratio:                99.68%  266790411
        Cache Miss Ratio:               0.31%   838925
        Actual Hit Ratio:               97.51%  260973203

So 16GB ARC looks enough. It did drop from 99.80% to 99.68%, I'll watch the dynamics more and increase ARC if necessary, just not to 380GB )
 
PMc Fragmentation, always the "gotcha" for memory allocators. One reason garbage collection is hard to get right.
rihad how about zfs-stats -E output? By default, the caching is for both data and metadata, so the -E gives you the breakdown on what is in the cache.

That output is simply saying most requests for data or metadata are being served from ARC.
If you look at the "-E" output you can get a feel for how the cache info moves from list to list. Pay attention to the "Ghost" values. If they go up or are significant, then increase the size of ARC because that implies data has fallen off MFU and MRU but is requested again. Think of "I'm streaming movie, in a loop, of size X so it gets read sequentionally and pushed out a socket. If my ARC is sized X/2 I completely flush the ARC and have to go back to the device. If my ARC is sized >= X then after the first loop the whole thing is in ARC so subsequent loops are satisfied from memory, no need to go back to the disk".
A shared database is another example that may benefit from more ARC.

Without knowing what this machine is being used for (quick scan I couldn't find that info), it's hard to say what is the appropriate size of ARC.
But in the end, it's your asset you have to feel comfortable with the performance.
 
Code:
$ zfs-stats -E

------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Aug 24 12:07:19 2021
------------------------------------------------------------------------

ARC Efficiency:                                 268.59  m
        Cache Hit Ratio:                99.68%  267.73  m
        Cache Miss Ratio:               0.32%   856.24  k
        Actual Hit Ratio:               97.52%  261.91  m

        Data Demand Efficiency:         99.82%  169.18  m
        Data Prefetch Efficiency:       93.31%  7.19    m

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             1.97%   5.28    m
          Most Recently Used:           38.04%  101.84  m
          Most Frequently Used:         59.79%  160.07  m
          Most Recently Used Ghost:     0.07%   195.47  k
          Most Frequently Used Ghost:   0.13%   340.61  k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  63.07%  168.87  m
          Prefetch Data:                2.50%   6.71    m
          Demand Metadata:              34.41%  92.13   m
          Prefetch Metadata:            0.01%   24.67   k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  36.42%  311.81  k
          Prefetch Data:                56.15%  480.80  k
          Demand Metadata:              5.98%   51.19   k
          Prefetch Metadata:            1.45%   12.44   k

------------------------------------------------------------------------
 
Thanks.
I think that the "misses" section is implying a little more ARC would help by satisfying more requests from RAM.
But, that's just my opinion.
 
Does newly written data go through the ARC cache and stay there (this is a PostgreSQL replica+analytics server)? If not, then it's entirely possible that all newly written data gets read back and qualifies as a MISS initially. Other than that "cache hits" is a nice metric to eyeball to get the general idea ) And it's pretty high - 99.68%. If it decreases, then yeah, that could imply that cache size is insufficient.
 
This is probably unrelated to ZFS, but why doesn't top's memory breakdown sum up to the amount of physical memory?

Code:
Mem: 8450M Active, 27G Inact, 110G Wired, 104K Buf, 129G Free

ARC: 16G Total, 5055M MFU, 11G MRU, 3568K Anon, 95M Header, 412M Other

     15G Compressed, 26G Uncompressed, 1.77:1 Ratio

Swap: 16G Total, 16G Free
This sums up roughly to: 8.5+27+110+129 == 275g

while:

Code:
$ sysctl hw.physmem
hw.physmem: 410834587648


Code:
$ grep -E '^(real|avail)' /var/run/dmesg.boot 
real memory  = 412304277504 (393204 MB)
avail memory = 400211787776 (381671 MB)

Spotted on 2 machines, both running FreeBSD 13.0
 
L2ARC is exactly what it sounds like, but instead of being in RAM it's a device. So yes having fast SSD/NVME device holding L2ARC in front of spinning disks could be a performance boost.
 
Back
Top