Solved How Much Swap

Geezer · Apr 13, 2021

In the handbook it still says "As a rule of thumb, the swap partition should be about double the size of physical memory."

Is this still practical?

I am considering (dreaming of) getting a nice new machine with lots of ram and an NVME (or two mirrored) for the system disk.
Double the RAM amount equivalent of swap would use up a significant proportion of the system disk.

What is the current thinking?

olli@ · Apr 13, 2021

Last time I set up a new machine I gave it 4 GB swap. That’s large enough to hold a minidump in case the machine crashes.

SirDice · Apr 13, 2021

Geezer said:
Is this still practical?

Up to a swap size of around 8GB, yes. Often it's not useful to have anything more than that. There are some edge cases were a lot more swap would be beneficial but overall I keep swap at 4 or 8GB, even if the RAM size is 380GB or more.

Deleted member 67440 · Apr 13, 2021

Warning: sometimes ARC eats RAM, so, as a rule of thumb, I find 4GB too small

I choose 16GB for my servers (machines with intensive use).

SirDice · Apr 13, 2021

ARC doesn't use swap (that would defeat it's purpose) and will give up (or should give up) its used memory in favor of more pressing memory needs.

Deleted member 67440 · Apr 13, 2021

SirDice said:
ARC doesn't use swap (that would defeat it's purpose) and will give up (or should give up) its used memory in favor of more pressing memory needs.

ARC eats up RAM, and almost never give up, al least <13.

So it is normal for other memory intensive processes (virtualbox, archiving programs for backups) to end up swapping, until they crash.

This is THE problem to be solved immediately (limit the use of ARC) when you have virtualbox machines running, with something like

Code:

sysctl vfs.zfs.arc_meta_limit=2000000000
sysctl vfs.zfs.arc_max=8000000000

or whatever

SirDice · Apr 13, 2021

fcorbelli said:
ARC eats up RAM, and almost never give up, al least <13.

Limit vfs.zfs.arc_max. By default it will try to use all memory minus 1GB. On pre-13.0 ARC was kind of reluctant, or at least not quick enough, to release its used memory. Which sometimes led to things spiraling out of control, with other applications (notably MySQL/MariaDB and virtual machines) and ARC all fighting over the same memory. It might still be, I haven't tested this extensively yet, but I hear from other people that the situation is much improved with 13.0. Still a good idea to limit ARC to remove any potential contention.

To note, that 16GB of swap would fill up just as likely as an 8GB swap would when this happens. It just takes a little longer for it to happen. Been there, done that.

olli@ · Apr 13, 2021

In such a situation it is much better to configure your ARC usage properly (arc_max) instead of wasting space and time with a larger swap. In general, a system should be configured to not start swapping or paging under normal conditions, not even under high load. The system should use swap only as a last resort under unusual (unexpected) conditions, just to avoid crashing the machine or killing important processes. If a system starts swapping, the admin should be alerted and look after it immediately in order to remedy the situation.

Note that old rules of thumb don’t apply today anymore. That old “swap = 2 × RAM” rule originated from a time when systems had little RAM, and hard disks were much faster in relation to RAM (over the past decades, the RAM bandwidth grew orders of magnitudes more than the speed of hard disks). 30 years ago it was pretty common that a machine used its swap space during normal operation, and it wasn’t a big deal. Today, you don’t want machines to swap at all.

Deleted member 67440 · Apr 13, 2021

In my experience 4GB of swap files are too few: if the RAM is full - it usually means that you are on the way to thrasing (which is not really a big problem with solid state drives).

So 4GB, at least for my loads, gives sudden virtualbox crashes (many times over the years).

With 8GB there are usually fewer problems, except when starting deduplicating restore-backup procedures, which take up large amounts of RAM to maintain block hashes.

That's why I use 16GB, which I think is a good compromise for my servers: paradoxically, with 16GB of file-virtual memory swap FreeBSD <13 is generally able to complete even heavy jobs, waiting for some RAM to be free.
With 4 not for sure (been there, done that, hundreds of times)

Of course not everyone launches multi-terabyte deduplicated restores every night, while it makes 5 more copies at the same time on two NICs and the virtual machines start their internal backups (of MS SQL on Windows) etc

Today, you don’t want machines to swap at all.

About server shouldn't use swap files I agree, but it needs to be explained to FreeBSD, especially the really, really "voracious" ARC
And, sometimes, ... swap just to swap

Code:

ARC: 24G Total, 4647M MFU, 18G MRU, 6936K Anon, 205M Header, 1462M Other
     20G Compressed, 71G Uncompressed, 3.58:1 Ratio
Swap: 16G Total, 325M Used, 16G Free, 1% Inuse

Yes, 325MB.
Why? FreeBSD mistery

Code:

 764 mysql            1  52    0  7064K     0K  2432K wait   11   0:00   0.00% <sh>

PMc · Apr 13, 2021

fcorbelli said:
ARC eats up RAM, and almost never give up, al least <13.

Then configure it properly, according to your application's memory demand. And limit your vnode cache accordingly. (Yes, there is a bit of math involved.)

fcorbelli said:
So it is normal for other memory intensive processes (virtualbox, archiving programs for backups) to end up swapping, until they crash.

Swapping doesnt crash programs, it just makes them slow (and rots SSD drives). Unless they are broken anyway.

fcorbelli said:
This is THE problem to be solved immediately (limit the use of ARC) when you have virtualbox machines running, with something like

Code:

sysctl vfs.zfs.arc_meta_limit=2000000000 sysctl vfs.zfs.arc_max=8000000000

or whatever

arc_meta_limit is advisory only. It doesn't limit the meta, it just gives a preference when evicting.

Concerning the original question: I would recommend to provide as much swap as the machine may move out within realistic time. For a standard SSD with a practical thruput of 250MBsec, it could fill 15GB within a minute. So I would say that is a realistic size that leaves you some chance to catch a runaway process before the the oom killer hits in. Faster devices accordingly bigger.

Son of a Beastie · Apr 14, 2021

It really depends on the use-case. If you are using a laptop and intend to hibernate to the swap, then you should follow the old rule of thumb about swap size.

For a desktop you intend to leave on all the time, 4GB is probably fine. For a server, you don't need much, if any.

Deleted member 67440 · Apr 14, 2021

PMc said:
Then configure it properly, according to your application's memory demand. And limit your vnode cache accordingly. (Yes, there is a bit of math involved.)

I have never had to do this, while as mentioned the limitation of the cache is indispensable for me

PMc said:
Swapping doesnt crash programs, it just makes them slow (and rots SSD drives). Unless they are broken anyway.

In theory.
Practically swapping can crashes programs, at least on FreeBSD, it doesn't just make them slower.
A fact that has been established for years on dozens of servers, in the hard way.

It essentially depends on WHICH programs: as indicated above I am referring specifically to virtualbox, and, to a lesser extent, mysql
Certainly broken, but there isn't much of an alternative.

In fact, it happens that, for some inexplicable reason, mysql (mariadb) (or rather FreeBSD) decides to do some swapping for a few tens of MBs even when the use of its memory is limited and there are at least a hundred GB of free RAM
Why? I don't know, in fact it's too hard to spot on busy servers.
I should stop one and monitor it thoroughly, but the effort is not worth the result.

PMc said:
arc_meta_limit is advisory only. It doesn't limit the meta, it just gives a preference when evicting.

In fact I put both settings to vary.
Always based on inexplicable crash of virtualbox, then solved.

PMc said:
Concerning the original question: I would recommend to provide as much swap as the machine may move out within realistic time. For a standard SSD with a practical thruput of 250MBsec, it could fill 15GB within a minute. So I would say that is a realistic size that leaves you some chance to catch a runaway process before the the oom killer hits in. Faster devices accordingly bigger.

In my experience (with drives much faster than 250MB/s, at least 10 times faster, when not 30x) I never needed more than 16GB to have stable systems capable of unlimited uptime.
Of course I am referring to my servers, running the jobs of my clients.
I certainly cannot give advice for "any" server and "any" use.
All of them 11 or 12, nothing 13+

Short version: I don't suggest 4GB of swap file, under any circumstances.
Certainly not 2x the RAM (requiring way more than 1TB of swap in my case!)

On the other hand, it costs so little to increase it during the installation phase that I don't see any real reasons why not (on normal machines, not embedded systems etc).
For different scenarios (desktop, SOHO) I don't know, I never use them.

For a server, you don't need much, if any.

As in the example above there is a mysql process that, for some reason, decided to force some swap on a machine with 768GB of RAM.
I really don't recommend BSD servers without swap files.
Certainly not huge, for example I never, ever use hibernation.
But certainly not zero

richardtoohey2 · Apr 14, 2021

I think it's more than mysql; I've had way-too-long-running rsync processes given swap too.

It seems to be - in my experience - long-running processes that cause sudden memory pressure (e.g. importing 10s of millions of rows in MySQL's case) that are given swap instead of inactive memory. But I got lost in a rabbit-hole of tcmalloc versus jemalloc and the new version of jemalloc on 13.0 seems to have helped my use case (in brief testing).

4GB swap has saved my bacon, 8GB might have been better, but as others have said as soon as swap is seriously in use, there's been something worth investigating (time permitting!)

Deleted member 67440 · Apr 14, 2021

richardtoohey2 said:
I think it's more than mysql; I've had way-too-long-running rsync processes given swap too.

Nightly backups can create just the same problems.
Especially on Monday mornings

Glad to see I'm not the only one having unpleasant surprises with large-scale backups, RAM and swapfiles

Code:

and the new version of jemalloc on 13.0 seems to have helped my use case

I've also read about improvements in memory interaction, partly due to OpenZFS, in 13.
However, I don't think I'll install such a server for a year, maybe 18 months.
Just today I'm waiting for the CPU of a physical 13+ test machine, which I will use as a "backup cruncher"

I'm not entirely convinced, we'll see.

richardtoohey2 · Apr 14, 2021

If you are having trouble with MySQL, it's worth looking at using tcmalloc. It definitely works on my test machines - not been brave enough to try on production yet!

zirias@ · Apr 15, 2021

SirDice said:
I hear from other people that the situation is much improved with 13.0.

Just to add "anecdotal proof", this is how it looks like on my 13.0-RELEASE machine with ARC limited to 32GB(!) during a large poudriere build requiring a lot of RAM:

I guess I can finally drop the ARC limitation altogether. It does a good job on 13 making room as soon as there's a need

richardtoohey2 · Apr 15, 2021

It's still chewed up 3G of swap, even though 1GB Free and 11G Inactive - that's the bit I'm trying to understand. But I think it is trying to protect the Inactive memory because it thinks it might be more important (called back into use) and decides to use swap. But I'm not sure!

zirias@ · Apr 15, 2021

These 3G were in swap long before even starting this build, IIRC ended in there after another huge build. The fact they're still sitting there is a hint the OS chose wisely. (they weren't accessed in days!)

And yes, swapping out usually starts before the very last pages are allocated. But ARC unwilling (or just too slow) to give back RAM definitely isn't a reason to start swapping any more on 13.

Edit: BTW, proactive swapping as soon as running out of RAM can be anticipated makes a lot of sense to improve overall system performance. The swapper chooses pages that are rarely accessed, so performance impact from that is minimal. But if it would wait until there's no more physical RAM, every single request for a new page (initial mappings for starting a new process as well as dynamic allocations from running processes) would have to wait for the swapper to find a page that can be swapped out and writing it to disk, so this would slow down everything.

richardtoohey2 · Apr 16, 2021

So if you had stuck with the default 4GB or so of swap, and you did a few more builds, you'd very likely end up out of swap?

Your machine has already got to 3GB of swap used - that would be 75% of the default 4GB.

I'm just interested in the OP's question and my experience with the default 4GB or so of swap - importing data into MySQL has a few times caused OOM for me, so the next time I set up a server I was going to try going for 8GB instead of taking the defaults, but maybe 16GB would be better. Just gives me more wriggle room and time to notice and sort things out.

zirias@ · Apr 16, 2021

richardtoohey2 said:
So if you had stuck with the default 4GB or so of swap, and you did a few more builds, you'd very likely end up out of swap?

I'd call it very unlikely. Why should more builds add more to swap? These are pages of other, inactive processes, most probably other services on my host that are rarely used. As soon as they are used, the pages will be swapped back in.

richardtoohey2 said:
I'm just interested in the OP's question and my experience with the default 4GB or so of swap - importing data into MySQL has a few times caused OOM for me, so the next time I set up a server I was going to try going for 8GB instead of taking the defaults, but maybe 16GB would be better. Just gives me more wriggle room and time to notice and sort things out.

You have an OOM situation when demand for memory is higher than physical memory plus swap. Of course, adding swap will "help", but as soon as you have to swap somewhat active pages, they'll have to be constantily swapped in and out, rendering the whole system slow. You only want inactive pages in swap. So, if you have 16GB worth of more-or-less inactive memory, it makes sense to have that much swap. Otherwise, what you really want is more RAM.

Note 1: I have that much only because there was room on the HDDs after removing an old "bootpool". I've never seen my system consume more than 8G swap since.

Note 2: Someone on here had a very special usecase with a single application operating on a huge set of data, backing that with a huge amount of swap. Things like that might make sense if swap is on a really fast device, but they don't match your typical usage patterns at all.

Deleted member 67440 · Apr 16, 2021

Frankly the use of the swap file (<12, for 13 I don't know) has always left me quite confused.
There is no, or at least I don't know, a precise and reproducible logic.

The theory on the functioning of virtual memory, and of swap file in particular, which we read in books or which is implemented (it happened to me) is very far from the practice of a complex system, especially (I always speak in my case) where there it is "something" (virtualbox and mariadb) itself extremely complex and with complicated interaction with the operating system.

Short version: in theory the swap file may very well not exist at all, it may even be limited to 1MB.

In practice, therefore empirically, since I use 16GB of them (probably 8 would be enough, but it costs me nothing to be on the safe side) I have no longer had inexplicable crashes of virtual machines, normally on Monday morning (in the WE large backups of all kinds)

Is it possible that, for some reason, pages that could have (should have) been in RAM are placed in the swap?
Maybe yes.
I don't think it makes any sense to have a machine with more than 700GB of RAM and maybe 3MB of swap file used.

Why is a single page used into the swap?

Perhaps, and I mean perhaps, in the case of a "congestion" in the RAM request the virtual memory manager "gets scared" and sometimes starts swapping out, not on the basis of the RAM already exhausted, but running out.

It's just a guess, I don't really want to read the FreeBSD management source

zirias@ · Apr 16, 2021

Frankly, I have doubts I understand any of your text, but I'll try. First, please stop talking about a "swap file". FreeBSD uses swap devices, possibly many of them, in which case it tries to stripe accesses.

fcorbelli said:
In practice, therefore empirically, since I use 16GB of them (probably 8 would be enough, but it costs me nothing to be on the safe side) I have no longer had inexplicable crashes of virtual machines, normally on Monday morning (in the WE large backups of all kinds)

Simple explanation: OOM happened. And before FreeBSD 13, ARC can contribute to that by not releasing RAM fast enough (that's why you often see recommendations to limit the maximum ARC size).

fcorbelli said:
Why is a single page used into the swap?

Not sure what to make of that. A page is the unit of virtual memory management, it's mapped to a physical RAM page, or written to swap. As a side note, on amd64, a page is 4K, a "super-page" is 2M. When RAM is running out, FreeBSD attempts to split up super-pages into normal ones, so at least smaller requests can be satisfied.

fcorbelli said:
Perhaps, and I mean perhaps, in the case of a "congestion" in the RAM request the virtual memory manager "gets scared" and sometimes starts swapping out, not on the basis of the RAM already exhausted, but running out.

And if I get that correctly, you're still wondering why the swapper decides to swap out some pages before physical RAM is completely exhausted? Then please read my explanation from above:

Zirias said:
Edit: BTW, proactive swapping as soon as running out of RAM can be anticipated makes a lot of sense to improve overall system performance. The swapper chooses pages that are rarely accessed, so performance impact from that is minimal. But if it would wait until there's no more physical RAM, every single request for a new page (initial mappings for starting a new process as well as dynamic allocations from running processes) would have to wait for the swapper to find a page that can be swapped out and writing it to disk, so this would slow down everything.

Deleted member 67440 · Apr 16, 2021

Zirias said:
Frankly, I have doubts I understand any of your text, but I'll try. First, please stop talking about a "swap file". FreeBSD uses swap devices, possibly many of them, in which case it tries to stripe accesses.

I don't care too much about terminology, I'm not giving a lecture at university, I'm writing on a forum, in a language that is not mine or the one I'm interested in knowing, while administering a hundred Unix servers

FreeBSD can use whatever it likes, as a swap area.

Zirias said:
Simple explanation: OOM happened. And before FreeBSD 13, ARC can contribute to that by not releasing RAM fast enough (that's why you often see recommendations to limit the maximum ARC size).

By "OOM" do you mean design and / or implementation flaws happen too?
Sure.
It's not physical memory, it's virtual memory.
And no, OOM shouldn't happen if the system is well designed.
For example in my software, those written by me, it does not happen.

Zirias said:
Not sure what to make of that. A page is the unit of virtual memory management, it's mapped to a physical RAM page, or written to swap. As a side note, on amd64, a page is 4K, a "super-page" is 2M. When RAM is running out, FreeBSD attempts to split up super-pages into normal ones, so at least smaller requests can be satisfied.

What relevance does all this have? None.
We can take for granted the general study of virtual memory as a banality of the first year of university.
But it has nothing to do with BSD's problems, which makes it unreliable.
Which is perhaps the worst thing that can be said about an operating system: unreliable.
Sometimes it works, sometimes it doesn't.

Zirias said:
And if I get that correctly, you're still wondering why the swapper decides to swap out some pages before physical RAM is completely exhausted? Then please read my explanation from above:

No explanation is needed, thank you.
It would be useful to read the source, and maybe understand it, but frankly it doesn't change the result that much, and frankly I don't have time to fix it myself.

That is, mistakenly a part of the virtualized memory is (sometimes) written to the swap (swap file, swap device, swap whatever).
Why on a system with 200GB (yes, gigabytes) of free phisical RAM, FreeBSD sometimes decides to write 2MB on swap?
The physical memory is going to be exhausted?
No.
Running low?
No.

So why?
If you are able to explain it based on the source code I would be very curious.
Otherwise, if they are hypotheses, like mine, less interested.

Maybe it's a "feature" (just like for Apple).

All of this, for example, does not happen on other systems, even UNIX, and they don't lead to catastrophic crashes like on FreeBSD

Thanks anyway for your interest.
Useless, but welcome.

zirias@ · Apr 16, 2021

Strong opinions and little knowledge mix badly. I'm out of here.

Deleted member 67440 · Apr 16, 2021

Zirias said:
Strong opinions and little knowledge mix badly. I'm out of here.

Perhaps strong facts would be better than "strong" opinions.

Solved How Much Swap

Geezer

olli@

SirDice

Administrator

Deleted member 67440

Guest

SirDice

Administrator

Deleted member 67440

Guest

SirDice

Administrator

olli@

Deleted member 67440

Guest

PMc

Son of a Beastie

Deleted member 67440

Guest

richardtoohey2

Deleted member 67440

Guest

richardtoohey2

zirias@

richardtoohey2

zirias@

richardtoohey2

zirias@

Deleted member 67440

Guest

zirias@

Deleted member 67440

Guest

zirias@

Deleted member 67440

Guest