Solved FreeBSD Inactive memory

It is tunable via sysctl -d vm.swap_idle_{enabled,threshold{1,2}}
Code:
vm.swap_idle_enabled: Allow swapout on idle criteria
vm.swap_idle_threshold1: Guaranteed swapped in time for a process
vm.swap_idle_threshold2: Time before a process will be swapped out
From RTFM tuning(7): The vm.swap_idle_enabled sysctl is useful in large multi-user systems where you have lots of users entering and leaving the system and lots of idle processes. Such systems tend to generate a great deal of continuous pressure on free memory reserves. Turning this feature on and adjusting the swapout hysteresis (in idle seconds) via vm.swap_idle_threshold1 and vm.swap_idle_threshold2 allows you to depress the priority of pages associated with idle processes more quickly then the normal pageout algorithm. This gives a helping hand to the pageout daemon. Do not turn this option on unless you need it, because the tradeoff you are making is to essentially pre-page memory sooner rather than later, eating more swap and disk bandwidth. In a small system this option will have a detrimental effect but in a large system that is already doing moderate paging this option allows the VM system to stage whole processes into and out of memory more easily.

Thanks i will hold hands up and say I didnt know about that one :)

Great news it exists. :)
 
From my reading (but not necessarily understanding!) The Design and Implementation of the FreeBSD Operating System 2nd Ed

p.292 Whenever an operation that uses pages causes the amount of free memory to fall below the minimum thresholds, the pageout daemon is awakened.

It tries to work out what to do, but if it can’t do enough, the swapout daemon gets involved:

p.295 Swapping occurs ... when the system becomes so short of memory that the paging process cannot free memory fast enough to satisfy the demand ... may happen when multiple large processes are run on a machine lacking enough memory for the minimum working sets of the processes.

So I think that might be what I'm seeing. But there's the distinction between processes getting swapped out (which I'm not seeing) and pages (which I think I am seeing).

MySQL is making a big demand for memory, free RAM isn't available, something thinks it can't sort out Inactive memory fast enough or launder fast enough - allocate the request in swap. Or maybe kick some other MySQL pages into swap and satisfy the immediate demand from something else (not sure what?)

I was only seeing this on 12.1, but now seeing the same on an 11.4 machine (also with MySQL 5.6).
 
On mysqldump and swap - there are posts going back 10+ years about mysqldump and swapping, including on Ubuntu etc. A more recent one is this: https://bugs.mysql.com/bug.php?id=85979

So something about MySQL and not specific to FreeBSD. I'll carry on tweaking and try tcmalloc.

EDIT: tcmalloc doesn't make much difference. Looks like I'm just trying to do too much and once the tables get to a certain size you either need more RAM or to make sure you've limited MySQL to a smaller portion than you might think. mysqldump is a very thin wrapper - basically does SELECT * FROM each_table and then iterates through the results and writes them out.

If you do mysql -e "SELECT * FROM a_big_table" > some_file.sql on a machine where you are having swap with mysqldump, you'll have problems VERY quickly this way. It's like running mysqldump without the --quick option (that tries to use less memory but writing files as it goes). (Not quite the same because the RAM usage is in the mysql client program - that's what can use up all RAM and cause swapping. So not quite the same as what mysqldump does, because that seems to sip RAM in comparison).

Using --verbose on mysqldump in one window and watching top in another - when I've given MySQL almost all of the RAM in an 8GB machine - when it hits a table that's 6GB (as shown in Data_length in SHOW TABLE STATUS) then it starts to hit RAM and when it's chewed through that, you end up swapping. Wondering if this SQL (whether via command line or mysqldump) is effectively loading the caches - so because I've given MySQL too much RAM for caching on this machine - it fills up the caches, starts to spill into swap but MySQL doesn't want to free any of this - it's RAM I said it could have for its caches. If I throttle the InnoDB cache size to a couple of GB then the mysqldump works.

My SQL dump files are 20Gb or so, and a couple of the big tables are ~5Gb or ~6Gb each - so guess it makes some amount of sense. Want to work with this much data, need to have a lot of physical RAM - just more than I thought. Or allocate far less to MySQL.
 
Last edited:
I too have wondered about the same sort of things, seeing swap and seeing inactive at the same time. However I never brought it up on this forum because when my machines needed swap they took it, but inactive was much lower when they did. I also read somewhere but can't remember where, that FreeBSD does a neat little thing. It can have a stuff both swapped and in inactive at the same time to ensure maximum speed with its next decision. When that decision time comes, if it needs the ram for something else, the process is already in swap, if however that item needs to become active then it simply goes from inactive to active, which is fast, and then will be removed from swap. I thought that was brilliant. I trust this is true, but if someone needs to correct, or straighten out what I have posted then please do so.
 
For backup runs I added a swapoff -a and swapon -a at the end to push everything back into ram, my first backup run after upgrading to 12-STABLE took things to a new level, it utilised 22gig of swap on a machine with 32 gig of ram, there was 17 gig of inactive/free ram before the backup process as well, and yes it was a database backup, usually it just swaps out a few hundred meg at most.

However the backup yesterday was absolutely fine, so I think was a one off.
 
Unmounting the NFS share immediately frees up the Inact memory.

The only references I can find to NFS caching seem to be related to writes. I can't see an obvious way to reduce the size of (or completely clear) an NFS cache, like you can with ZFS ARC. Perhaps I should ask this in a new thread?

Still having this problem. Noticed that my ARC was only 31GB (with 128GB physical RAM), and discovered that a file copy from an NFS mount a few days ago seems to have been persistently hogging 65GB+ of memory...

Before unmount + mount:
Mem: 1713M Active, 65G Inact, 11G Laundry, 43G Wired, 732M Buf, 3525M Free
ARC: 31G Total, 13G MFU, 14G MRU, 3280M Anon, 231M Header, 1033M Other

After unmount + mount:
Mem: 1713M Active, 604M Inact, 11G Laundry, 43G Wired, 68G Free
ARC: 31G Total, 12G MFU, 15G MRU, 1808M Anon, 231M Header, 1032M Other

I understand that caching is good, but this behaviour - caching huge amounts of a file last accessed via NFS days ago, to the detriment of other caching of local file systems - does not seem correct.

I guess I could set up a daily cron to refresh the NFS share, but that seems clunky...

edit: Discovered similar on another machine, where NFS caching was consuming 13GB of 32GB. Why isn't it ageing the cached data, and releasing the memory back to the system?
 
caching huge amounts of a file last accessed via NFS days ago, to the detriment of other caching of local file systems - does not seem correct.
Unused memory is useless. It doesn't cache NFS to the "detriment" of the caching other things. Apparently you have plenty of memory to cache both. So it's pointless to "clear" the cache just because you think you should have more "free" memory.

How did you determine ARC cached things to the "detriment" of the local filesystems? How did you measure that?
 
Unused memory is useless. It doesn't cache NFS to the "detriment" of the caching other things. Apparently you have plenty of memory to cache both. So it's pointless to "clear" the cache just because you think you should have more "free" memory.

How did you determine ARC cached things to the "detriment" of the local filesystems? How did you measure that?
It's NFS caching that is the issue, not ARC. This is a busy database server, with lots of semi-random access, so I actually *want* as much memory as possible to be available for ARC to grab.

An NFS cache consuming more than half of physical RAM (65GB!) to cache parts of a file last accessed days ago - and where other file systems are mounted and need their own cache - does not make sense to me. This is not asking the same old "hey, where did my free memory go?" question, it's pointing out that, in this particular instance, NFS using that much RAM to cache "not at all recently used" content is incredibly inefficient.

As per my message last year, I've had trouble figuring out how to limit the size of the NFS cache, or manually purge it (beyond an unmount and mount). I wonder if perhaps there's some hardcoded legacy limits, which seemed reasonable at the time they were imposed, but end up being way off with larger amounts of RAM.
 
It's NFS caching that is the issue, not ARC.
The NFS client code only caches attributes (not file contents), and these are purged by default after 60 seconds. So, NFS caching is certainly not the issue. NFS mounts use the standard VFS infrastructure, so they use the normal buffer cache, just like a local file system.

Somehow I get the impression that something might be misconfigured on your machine. Do you have anything memory-related or filesystem-related in /etc/sysctl.conf, /boot/loader.conf, or in your kernel’s configuration file (if you use a custom-compiled kernel)?
 
Inactive memory is associated with the vnode/inode of the file it belongs to. It is reclaimed when free memory is needed. Why your ARC is not able to pressure against it - no idea. That is what you get when two smart caching methods are used together, dumb things may come out of it.
 
Small update to my NFS/memory cache issue. I just noticed that if I attempt an unmount of the NFS filesystem, even one which fails, it appears to free up (some) Inact memory:

Code:
=== Server1 ===

# top | grep "^Mem:.*Inact" | awk -F " Inact," '{print $1}' | awk '{print $NF}'
938M
# umount /xxx
umount: unmount of /xxx failed: Device busy
# top | grep "^Mem:.*Inact" | awk -F " Inact," '{print $1}' | awk '{print $NF}'
853M

=== Server 2 ===

# top | grep "^Mem:.*Inact" | awk -F " Inact," '{print $1}' | awk '{print $NF}'
2306M
# umount /xxx
umount: unmount of /xxx failed: Device busy
# top | grep "^Mem:.*Inact" | awk -F " Inact," '{print $1}' | awk '{print $NF}'
2011M

So a periodic unmount (and re-mount if successful) is still worthwhile. Perhaps even deliberately opening a file during the unmount attempt to prevent it actually succeeding.
 
The reasons behind this is being discussed in post 4+ of this recent thread.
Just allocate an amount of memory that equals the size of the "inactive" memory.
Very easy to do with the swapstresser utility I mention in post #5 there.

Inactive memory is associated with the vnode/inode of the file it belongs to.
Now this is interesting.
Any idea how to list these mappings, e.g. file names/paths + the size of the "inactive" memory chunks reserved for these?
 
The NFS client code only caches attributes (not file contents), and these are purged by default after 60 seconds. So, NFS caching is certainly not the issue. NFS mounts use the standard VFS infrastructure, so they use the normal buffer cache, just like a local file system.

Somehow I get the impression that something might be misconfigured on your machine. Do you have anything memory-related or filesystem-related in /etc/sysctl.conf, /boot/loader.conf, or in your kernel’s configuration file (if you use a custom-compiled kernel)?

Sorry, I missed this the first time around.

I have more than one server which shows the behaviour, so I've picked the one with the simplest config.

FreeBSD 12.1-RELEASE r354535 amd64

loader.conf (complete)
Code:
kern.geom.label.gptid.enable="0"
zfs_load="YES"
geom_mirror_load="YES"
#vfs.zfs.arc_max="2000M"

sysctl (relevant lines from rc.local):
Code:
/sbin/sysctl vfs.zfs.l2arc_write_max=134217728
/sbin/sysctl vfs.zfs.l2arc_write_boost=536870912
/sbin/sysctl vfs.zfs.arc_meta_limit=`/sbin/sysctl -n vfs.zfs.arc_max`
/sbin/sysctl vfs.zfs.prefetch_disable=1

custom kernel config (diff'd against GENERIC)
Code:
+device          speaker         #Play IBM BASIC-style noises out your speaker
+options         IPFIREWALL
+options         IPFIREWALL_VERBOSE
+options         IPFIREWALL_DEFAULT_TO_ACCEPT
 
rowan194:
  • These /sbin/sysctl vfs.zfs.* belong into sysctl.conf(5)?
  • ipfw(4) can comfortably be loaded at boot time from rc.conf(5). It gets loaded automatically when using sysrc firewall_script="/etc/my_firewall.ipfw" or any of the preconfigured packet filters; if you need it early, you can add ipfw_load="YES" to loader.conf(5), and also add there net.inet.ip.fw.default_to_accept="0"; but that's dangerous! - you can lock yourself out from a remote machine.
  • The speaker module can be loaded @runtime, e.g. sysrc kld_list+=" speaker".
Your custom kernel config is completely not necessary, you can do the same functionality with the GENERIC kernel.

Did you try w/o the custom settings for vfs.zfs.*arc*?
 
And FreeBSD 12.1-RELEASE is EOL. You should update to 12.2-RELEASE-p4. You can freebsd-update -r 12.2-RELEASE fetch & install, given that you can use GENERIC & don't need to build a custom kernel. If you have ZFS, you can clone with bectl(8) or beadm(1), and update in a chroot(8); you'll know how to do that.
 
If you like to build your own custom kernel & world, consider going to either 12-STABLE or 13-STABLE. That's source-only, so you build your system yourself, but with the GENERIC config unless you're really really sure you need tweaks. Read the Handbook on that (e.g. subscribe to mailing lists).
 
This seems to help in some situations.
Yes, the most interesting thing with this is to see if it actually frees the inactive memory for some time.
It is not a "solution" anyway, as inactive+laundry permanently grows, only helps to postpone the eventual reboot.

The issue is really annoying.
But, for my part I decided to put the inactive memory issue back until I change to FBSD 13.
 
Back
Top