Understanding memory management

Hi Everyone,

I'd like to learn more about how FreeBSD manages memory, specifically about the types of memory that top shows. I've read a few articles on the topic, also Chapter 7 in the Architecture Handbook. The following is my current understanding on the subject. Please clarify if I got something wrong:

FreeBSD manages memory in 4kB units called pages. These pages exist either in RAM or on the swap device. Each page can be in one of the following states, showed in the output of top:

WiredPages that must be kept in RAM at all times. Kernel memory is always wired.
ActiveAllocated memory which has been accessed recently by a process. Since a recent access makes future accesses more likely, the memory manager tries to keep Active pages in RAM, but they may be moved out to swap when RAM is filling up. Doing so will likely incur heavy performance penalties.
InactiveAllocated memory which has not been accessed recently by any process. These pages may be moved out to swap, and the expected cost is much lower than for active pages.
BuffersPages that represent the content of a file on the hard disk or other non-volatile storage device. This redundant information exists only to speed up filesystem access if enough RAM is available. When RAM becomes scarce, then buffers can be deallocated any time, therefore they are regarded as free memory.
ARCARC replaces Buffers for systems using ZFS. The Total field shows the number of wired bytes in ARC. MRU stands for Most Recently Used, MFU stands for Most Frequently Used data. There's also the Anon, Header and Other fields, which I have no clue about.
LaundryPages assigned to a file whose content has been modified since it was read from the disk, or it has not been stored on the disk yet. Laundry pages must be written back to the filesystem before they can be deallocated. They may count as free memory, though their availability may depend on how quickly they can be backed up.
FreeFree memory.
This description implies that determining which memory pages are considered free at any time is not that straightforward: inactive pages are allocated, but they may be moved out to swap, whereas laundry pages should become free soon, but they have to be written back to disk first. In general, we should regard Wired, Active, Inactive and ARC pages as used memory, whereas Buffers, Laundry and Free comprise free memory.

Top also shows the allocated memory for each process. RES shows the number of bytes which is currently present in physical RAM on behalf of the process. SIZE shows the total memory size that the process is able to address. This memory does not need to reside in physical RAM, and it can be shared among different processes; for example shared libraries are counted for each process that links to them.

Does this summary line up with reality? I'm particularly concerned about Buffers and ARC. My understanding is that Buffers and Laundry belong to the filesystem cache under UFS, both backed by vnodes; Laundry being pages whose content have changed in memory. But on my ZFS system I still have Laundry in addition to ARC. If ARC replaced Buffers, what is Laundry doing?

Please make any comments/corrections to the above. Thanks!
 
Yes, that's where I got the idea that a Laundry page refers to a sector in the filesystem or the swap, whose content has changed, and it needs to be written back before the page can be repurposed. I still don't know how this works with ARC, which has seemingly done away with Buffers.

It's also confusing that it writes that inactive pages which are dirty are moved to the tail of the laundry queue. Based on the Handbook I thought inactive pages don't contain filesystem cache, only memory allocated by processes? Why would process memory need to be laundered, if it's not tied to the filesystem?
 
As far I understand, Laundry is a short term cache for modified data, which will be saved to a filesystem soon. It isn't tied to any filesystem, it is a part of memory management in FreeBSD, not a filesystem. ARC is a long term cache for any data read from a ZFS filesystem. And it is a part of ZFS, not memory management.

By the way, as far I see, top command reports that FreeBSD allocate a small buffer (40k) even with ZFS.
 
Yes, that's where I got the idea that a Laundry page refers to a sector in the filesystem or the swap, whose content has changed, and it needs to be written back before the page can be repurposed. I still don't know how this works with ARC, which has seemingly done away with Buffers.
That is a common misconception. The trick to understand it is:
In Berkeley-style Unix every memory is considered some place in the filesystem, mmap()ed to memory. If the memory get's dirty, it points to the paging area, but the logic doesn't change: a memory page is either clean (then it has a place in the filesystem) or dirty (then it can be put into swapspace).
What the system does, is move these pages around between different queues to make best use of the memory, meanwhile keeping track of which pages can or should be updated to file or need to be fetched back from file.
And the ARC is just a cache in between.
 
Thanks, this is really interesting. But I'm getting even more confused. I thought putting pages into swapspace depended on the amount of memory and new allocation requests, and not on some inherent quality of each page, like being "dirty". So what does "dirty" mean exactly? What triggers this condition? Does it have to do anything with running out of free RAM?
 
Thanks, this is really interesting. But I'm getting even more confused. I thought putting pages into swapspace depended on the amount of memory and new allocation requests, and not on some inherent quality of each page
Yes and yes. Both is right. The question *if* and *when* pageout happens, depends on available memory. The question *how* and *where* the pageout goes, depends on the page.
Lets get to that:
, like being "dirty". So what does "dirty" mean exactly? What triggers this condition?
Dirty means changed. There are different cases:
If an executable is read from disk, it will usually not be changed in memory, so it does not need to be paged out, and can simply be done away with, as it can be loaded again on demand.
If your program creates a variable, that needs memory, This memory is certainly changed, so it must point to swapspace - and this is clarified beforehead, so when memory gets low, it is already clear where to move the various pages, and -most important- what cost is involved. This doesn't mean that the page might ever be paged out, but the system keeps track on the potential amount of swapspace needed (visible in sysctl vm.swap_reserved).
More interesting are other cases: imagine two applications copy the same data-file into memory. As long as they do not change it, the pages are clean and can simply be done away with when memory gets low (and reloaded when the application accesses them again). Since the pages point to the same file space, they do not need to exist twice in memory. But then one application changes the data in memory - now the page has to be copied into a dirty one and a clean one, and the dirty one must point to swapspace. Until the application decides to copy the data back to disk, then the page is clean again.
Does it have to do anything with running out of free RAM?
Not necessarily. It is just proper housekeeping, and having a contingency plan when when no more ram is available.
 
Memory management is quite a complicated topic. ZFS memory management even more so because ZFS extensively acts and interacts on so many levels of the OS. ZFS is a COW (Copy On Write) filesystem that combines a more or less traditional file system with a volume manager and provides extensive data protection. ZFS acts on part of main memory (=RAM) besides secondary memory (=disks) and, somewhere in between (L2ARC and SLOG).

When you're looking at the separate parts of memory management, a more detailed look at the basics and inner workings of memory management might be useful (without focussing on ZFS). For a good internal understanding of memory management in FreeBSD, I'd suggest you have a look at some of the main resources in hand and where memory management interacts or extends to the file system (secondary memory); that's where UFS (FFS in its origins) come into play. The book The Design and Implementation of the FreeBSD Operating System, 2nd Edition has a whole chapter on it: Chapter 6. Memory Management (see: FreeBSD Development: Books, Papers, Slides). In addition to that fine chapter:
In the last entry, Mark Johnston gives detailed insight—without discussing it at the code level—of the operation of three queues (the active, inactive and the laundry queue) that are important for memory management where swapping is concerned. As programs are loaded, run and finish, memory pages are continuously created, deleted and moved from one queue to another. When pages need to be swapped out to disk however, they are taken from the laundry queue. This process is handled by the laundry thread. The laundry queue and its handler were both added to FreeBSD 12 as a better way to manage memory and more reliably predict the use of memory pages likely not needed in the future.

As filesystems go, ZFS will take you to the next level. Where UFS extends somewhat into main memory management, ZFS has a prominent presence in main memory and control of its components there. When you look at how things interact in ZFS internally, it may help you to better understand the influence of individual memory (kernel) parameters:
Also, have a look at it from the top ;):
___
* unfortunately only slides; a small description of the presentation by Allan Jude in BSD in Taiwan | BSD Now 221 (from ca. 26:34 min).
** [edit: added] this article is also referred to in Explaining top(1) on FreeBSD but it deserves to be mentioned explicitly. grahamperrin had mentioned it explicitly before me.
 
Last edited:
Does this mean that swap is mandatory in FreeBSD? I don't like how slowly pages are loaded from swap, so I was thinking of investing more in RAM and disabling swap entirely. So when you say

If your program creates a variable, that needs memory, This memory is certainly changed, so it must point to swapspace

what happens if I either don't have swap at all, or it's much smaller than the actual RAM size, so not all pages in RAM can point to swap? Do I lose any functionality? Does this also mean that laundry exists only when swap exists?
 
Does this mean that swap is mandatory in FreeBSD?

No.

I don't like how slowly pages are loaded from swap, so I was thinking of investing more in RAM and disabling swap entirely. …

More RAM is good.

I don't have the usual links handy, but I'm fairly certain that it's usually recommended to enable swap. From Exploring Swap on FreeBSD | Klara Inc. (2021):

… Although swap is tremendously slower than RAM, it can still be a valuable tool. …

With or without swap enabled: it's commonplace to use the swap partition for crash dumps. dumpon(8).
 
Does this mean that swap is mandatory in FreeBSD?
It is not. After all, you can also disable/enable swap on the go. See swapctl(8).

I don't like how slowly pages are loaded from swap, so I was thinking of investing more in RAM and disabling swap entirely.
Don't forget that unused RAM is wasted RAM.
While more RAM is often better/desirable, personally, I'd recommend not spend money on hardware upgrades unless you're actually running into a performance issue/bottleneck. Unused RAM is wasted RAM.
Lets also not forget that modern non-volatile storage media (eg. SSDs) are pretty fast. So swapping in and out of the system isn't as "painful" as it used to be on a rotating disk where you're severely I/O limited (compared to a modern NVMe SSD). The kernel's job is to make good decisions on when to swap in or out. Unless you're actually running into a bottleneck you'd not expect the kernel to swap-out data that is frequently or soon accessed again.

I don't think that this is relevant here but let's not forget that in theory (and also in practice), more RAM makes a system less robust from a hardware perspective. This is especially true for non-ECC RAM.
 
If you like:


Code:
% date ; uptime ; uname -KU ; swapinfo
Sun 10 Apr 2022 20:17:19 BST
 8:17p.m.  up  1:16, 5 users, load averages: 4.34, 6.78, 6.64
1400056 1400056
Device          1M-blocks     Used    Avail Capacity
/dev/ada0p2.eli     16384      354    16029     2%
% sudo time /usr/home/grahamperrin/dev/swapflush/swapflush -a
grahamperrin's password:
        7.87 real         0.00 user         0.14 sys
% swapinfo
Device          1M-blocks     Used    Avail Capacity
/dev/ada0p2.eli     16384        0    16384     0%
%

Later:

Code:
% date ; uptime ; swapinfo ; sudo time /usr/home/grahamperrin/dev/swapflush/swapflush -a && swapinfo
Mon 11 Apr 2022 04:35:35 BST
 4:35a.m.  up  9:34, 5 users, load averages: 1.43, 1.75, 2.25
Device          1M-blocks     Used    Avail Capacity
/dev/ada0p2.eli     16384     1097    15286     7%
swapflush: swapoff: Cannot allocate memory
        0.00 real         0.00 user         0.00 sys
% date ; uptime ; swapinfo ; sudo time /usr/home/grahamperrin/dev/swapflush/swapflush -a && swapinfo
Mon 11 Apr 2022 04:37:07 BST
 4:37a.m.  up  9:36, 5 users, load averages: 1.83, 1.79, 2.21
Device          1M-blocks     Used    Avail Capacity
/dev/ada0p2.eli     16384     1081    15302     7%
       69.28 real         0.00 user         0.72 sys
Device          1M-blocks     Used    Avail Capacity
/dev/ada0p2.eli     16384        0    16384     0%
%
  • in response to swapoff: Cannot allocate memory I chose to close two Firefox windows, one of which included the FreeBSD Handbook.
 
Does this mean that swap is mandatory in FreeBSD?
Today not anymore. Traditionally (until ~2000) it was mandatory to have at least the amount of swap as of ram.
But them memory got always bigger, and in such sizes most of the swap was not practically useful anymore. So the systems were tuned to cope with the situation of no physical swap existing. They will nevertheless consider a dirty page as logically residing in swapspace, but, if there is none and as long as memory does not get seriously exhausted, this will be of no consequences.

what happens if I either don't have swap at all, or it's much smaller than the actual RAM size, so not all pages in RAM can point to swap? Do I lose any functionality?
Much smaller is not a problem, because the mapping is logical, not physical: the swap is filled as available. No swap at all will likely engage the OOM-killer when things get too big, and simply kill the biggest processes. But this is the default nowadays anyway - if you run with little memory and want slow swapping to happen, you must explicitely tune the OOM-Killer down (sysctl vm.pageout_oom_seq=larger_value )

Does this also mean that laundry exists only when swap exists?
I don't know, I never tried that. I build base+ports in bhyves, and I throw a couple of them in as demand arises, and they can get very swappy when e.g. building llvm13, but it doesn't matter because the whole builds take a day or so anyway - but I definitely don't want things being killed away in midflight and the system idling and the stuff not ready to install when I come back in the morning.

The general rule of best-practice is: give the system some amount of swap, say, 5-10 GB, on SSD. The advantage is: you can quickly see what is going on and when something is exhausting memory. The downside is, if you have a bunch of browser tabs linger for a day unused, the system will consider them idle any may page them out, e.g. in the night when the periodic daily runs and needs lots of inode space. So the next access to these tabs will then be noticeable slower.
 
Have a swap device at least as large as the RAM in your machine in case you want to save and later poke at kernel crash dumps. But in general any swapping will slow things down (much less so on an nvme device but still noticeable) so put in more RAM. When suddenly the systems seems to slowdown, you can use ps to check that your firefox or some other program has been using up memory! Otherwise the kernel will kill a random process when all physical memory is used up. Here you at least have a chance to do something more useful.
 
… thinking of investing more in RAM and disabling swap entirely. …

… I don't have the usual links handy, …

I still can't find the reference that I want, the closest I can find is this (from Mark Johnston's article):

… many applications allocate RAM in advance which they never actually use. If you have swap available, those mallocate requests are initially fulfilled by simply reserving swap space. Without swap, on a system without free RAM, the kernel must evict potentially valuable page cache to honor the mallocate call—even if that allocation is never actually used by the app which requested it. …

Discussion of the article: <https://news.ycombinator.com/item?id=25789809>



More …

tuning(7)

<https://old.reddit.com/r/freebsd/comments/2k6hq0/swap_virtual_machines/clj2ic1/> is old (2014), but covers some of what's in the manual page.

<https://unix.stackexchange.com/a/146166/13260> is similarly old, but helped to remind me of these two things:
FreeBSD bug 263212 – Author (Matthew Dillon) and date of publication (2000) for 'Design elements of the FreeBSD VM system'

Chapter 8: Beyond Processes in Forensic Discovery (2004)
 
I still can't find the reference that I want, the closest I can find is this (from Mark Johnston's article):
Well, elaborate on that. Explain Your point. :)

Or, let people just do what they want. There is a tendency for simple explainings - so if over time some count on the swap, like here:

Code:
470 processes: 1 running, 469 sleeping
CPU:  0.1% user,  0.0% nice,  1.2% system,  0.0% interrupt, 98.7% idle
Mem: 409M Active, 5759M Inact, 789M Laundry, 12G Wired, 1174M Buf, 12G Free
ARC: 6543M Total, 5351M MFU, 587M MRU, 3660K Anon, 415M Header, 178M Other
     5514M Compressed, 10G Uncompressed, 1.87:1 Ratio
Swap: 36G Total, 17M Used, 36G Free

... then that must be the obvious reason why the machine gets slow.
 
(Nit: I don't imagine perceptible slowness from just 17 M use of swap.)
I agree on that. After all, the system also reports 12G of free memory.
Unfortunately, I don't yet have an in-dept understanding of FreeBSD's memory management design but in general (i.e. from an academic/abstract point of view), a kernel is allowed to use swap space when available. Even if there's plenty of primary memory available. I have no idea what these 17M of swapped data exactly is but given that it's swapped-out while there is still plenty (!) of available primary memory would suggest that the kernel decided that those 17M are just not "needed" in primary memory. In general, this is not an indication for a badly designed memory management system nor that the machine would benefit from additional RAM being installed.
 
I agree on that. After all, the system also reports 12G of free memory.
Unfortunately, I don't yet have an in-dept understanding of FreeBSD's memory management design but in general (i.e. from an academic/abstract point of view), a kernel is allowed to use swap space when available. Even if there's plenty of primary memory available. I have no idea what these 17M of swapped data exactly is but given that it's swapped-out while there is still plenty (!) of available primary memory would suggest that the kernel decided that those 17M are just not "needed" in primary memory.
There may have been an earlier time when memory was filled by something. That something disappeared again, memory was freed, but a couple unused pages had gone into swap - and were never fetched back from there because these pages weren't touched.
There is quite a lot of pages that are used once at process start for some initialization, and then not again - because you rarely use all the options of some command.

Swap is not like a railway train where the swap wagons come at the end and all is filled (and later freed again) in sequence. With earlier systems (say Linux as of 1994) you could observe memory getting full, and only then the machine would come to a halt while moving memory to swap - you could not even login, but you could watch the drive light. This is easy to understand what's happening - but it is not very useful for networked systems and multiprocessor systems. So we need an approach that acts earlier and more probabilistic.
 
you can get info vmstat -m vmstat -z
Yes, but they did never line up.
And I had this large lump of "Wired", half of the installed mem or more, and it gave me headaches since I use ZFS. I know what it is, most of it comes from kernel but it can in part also be user processes, and there is no means to sort it out and get a precise quantity structure of what is actually in there right now.
Then at some point my ZFS stalled at arc_min after a day or so because of mem fragmentation (still i386), and then UMA appeared, and I began to understand what it is all about - what the problem actually is with mem allocation, what the solution is, etc. And then I figured why the "Wired" may bloat into more than 60% of installed mem (a lot more than the actual ARC size), but then may magically shrink again when memory gets scarce - and this works by signalling - there is not a code function I might find within arc.c (yes, I searched for such).

And now You come along and explain all this, just as if it were the daily news. Yeah, I had great fun reading that article. ;)
 
In my experience (with swap enabled), killings are not always so blunt.
It depends on how fast your swap is.
And on sysctl vm.pageout_oom_seq.

(The linked text is verbatim, not me agreeing about poor choices – I don't know enough.)
Yes, I have perceived exactly this as described here, when building my ports. I searched for solutions and found the hint to pageout_oom_seq. This can be set to 1000 or higher, and then the oom-killer will not engage, even on slow swapspace. (I don't know what happens when actually running out of swap - maybe this does then not get detected - but should never happen anyway.)
But then, building llvm13, with the fortran stuff, at some point creates compile processes around 4 GB each. So running this in a 10core vm, one needs 40GB mem size. Forcing this to swap will then not crash, but will never come to an end, because these 10 processes continuousely rival against each other, and none gets enough RSS to really get someway further.
That means: swapping, which worked somehow successful in single CPU systems, may not work at all in multi processor when multiple big processes are created at the same time.
 
Back
Top