Out of memory

D-FENS · Mar 19, 2019

Ancient said:
Look...I know that FreeBSD installations consume 2GB of minimum. But you mentioned that you have a few programs...look: only Xorg consumes 1GB. I know it because I used it.

In theory, yes. I have a GUI system with XFCE desktop and FreeBSD 12.0 which runs on 512 MB RAM just fine. 60 GB HDD and ZFS file system.

Edit: The system is quite new though, and it does not have a lot of snapshots. Maybe the massive snapshot count mentioned below is a hint?

D-FENS · Mar 19, 2019

usdmatt said:
The system I was running the other day had 12GB RAM and 2TB of storage (4 1TB disks in mirrors). I currently have ARC limited to 10GB and it's been fine since. Of course it shouldn't really matter that much. Filling a system with 10TB disks shouldn't mean you suddenly need 128GB of RAM to stop it from falling over. ARC should always leave some memory free and release some if the system starts to run out.

~~I have a system with 16 GB RAM and 4x4TB disks and I have not yet run out of memory. It's used mostly as a light server though.~~

Edit: I think it actually DID run out of memory. I've been having HDD problems when scrubbing for a while but I thought it was a hardware problem. I looked deeper into it and it might be actually a memory problem.

PMc · Mar 21, 2019

usdmatt said:
Much as I like ZFS, I've had memory issues with it since day 1, and always limit ARC manually.

Agreed, that is a funny beast and I dont grasp it fully.
Lately I reduced desktop to single disk (for noise reasons) and implemented remote mirroring with zfs send. Then I watched it: 8GB mem installed, single 500G (spinning) disk, ZFS sitting there with 4.5G ARC and 6.8G(!) wired mem, firefox pushed into swap(!), the machine mostly idle and disk 100% busy with 2MB/sec thruput (daily chksetuid not getting anywhere), and if that weren't already enough:

Code:

ARC Summary: (THROTTLED)
        Memory Throttle Count:                  3

That is not how it is supposed to function, but I couldn't figure out what problem the machine had (it seems to have to do with massive snapshot usage).

I consider to not recommend ZFS for single-disk desktop use. Once I tried to add an SSD-l2arc, but didn't perceive any improvement. Probably when the thing runs entirely from SSD - but then there might be not much point in a large ARC - the OS can buffer as well.

I tried a new server a few weeks ago without a manual limit to see if it was any better, but it started killing processes and the NFS service (which is what it was being used for) went down within 24 hours. I actually posted on the mailing list about it as I thought they'd been improvements made over the years but only got a few "me too" responses.

I would like to contradict the stance that "ZFS needs much ram". Probably one can overcome most issues with an extreme amount of ram, but otherwise the matter needs planning or some good educated guesses: databases can cache, the OS can cache, and putting it all together and hoping that it would decide for a nice interplay on its own behalfs, may not be enough.

Not trying to knock FreeBSD too much, I've used it since 3.x and much prefer it to everything else; It's just frustrating that after more than 10 years, ZFS still seems, in my experience at least, to need draconian memory limits to stop the entire system starving itself to death.

It seems, ZFS deals with memory like some people deal with money: they never have enough, and if perchance they win in the lottery, it only leads to them getting really in debt. (The solution is to not give them more than they actually need.)
ZFS ist mostly just a cache - and running a (big) server with 90% of memory used as a filesystem cache sounds suboptimal to me - in such case I would prefer the application to decide to cache what it actually needs. The "adaptiveness" of ZFS cache is just a workaround for lack of this, and may or may not perform good, depending on the kind of workload.
I would rather start with a small ARC and then increase as long as it brings actual performance improvement - and always prefer an ability of the application to do it's own caching.

PMc · Mar 21, 2019

usdmatt said:
The system I was running the other day had 12GB RAM and 2TB of storage (4 1TB disks in mirrors). I currently have ARC limited to 10GB and it's been fine since. Of course it shouldn't really matter that much. Filling a system with 10TB disks shouldn't mean you suddenly need 128GB of RAM to stop it from falling over. ARC should always leave some memory free and release some if the system starts to run out.

Early in the story (~2008?), when I wanted to use ZFS for data integrity reasons (not concerning performance), I made it run on significantly less than 1G RAM. I had to go into the code and change the adjustments that are made there. There were three parts of them:

Some very simple math to initially come up with the adjustable values: arc_max and some limits.
Ways for ZFS to continuously receive the state of the OS (free mem etc.) and react on it.
The internal mechanics of ZFS to adjust ARC size and usage.

There are proably few people who would understand item 3, and this comes from the developers of ZFS, while item 1+2 were done for FreeBSD integration. Item 2 is what I had to adjust, and this has evolved over time (but I didn't look into it more recently).
But, Your concern is with item 1, and this is just some simple best guess that hopefully work for the majority of users. You definitely should adjust these values, as You know Your workload and therefore almost always will make better choices.

There is no rule that XTB of disk would require Y GB of ram. (There is such a rule for using deduplication, and also for the use of l2arc - the latter depends on block- and filesizes. And there also seems to be an issue with snapshots, but I still have to figure that one out.) The more interesting point is the disk access patterns of the application, and therefore, how it can benefit from caching.
Also, caching behaviour can be further adjusted by switching it on/off in each individual filesystem.

xtaz · Mar 22, 2019

I run ZFS with a single hard drive on a server and a laptop, both of which have 4GB of memory. The ARC is limited to 1GB on both and despite the free memory quite often being close to zero I've never had a problem. They rarely use swap, and if they do it's usually things which have been idle for hours rather than active processes.

However, with the default settings where ARC is not limited, then yes, it rapidly consumes all available memory and swaps like crazy. The out of the box configuration for ZFS/memory management really needs to be looked at. As people shouldn't have to limit the ARC size just to get it to behave.

ralphbsz · Mar 22, 2019

Agree: I run ZFS on a 32-bit machine with 3GB of memory, with 4 disks under ZFS control, 4 pools ranging in size from 1TB to 4TB. No memory problem at all, just have to set the well-known parameters in /boot/loader.conf:

Code:

vm.kmem_size="512M"
vm.kmem_size_max="512M"
vfs.zfs.arc_max="64M"
vfs.zfs.vdev.cache.size="8M"
vfs.zfs.prefetch_disable=0

Note that this a server machine, without X windows, without GUI, not running memory-intensive applications (like web browsers). I have not tuned the above settings for ideal performamce, since performance is good enough for my needs, and the time investment involved in tuning isn't worth it.

I agree it would be nice if ZFS memory usage could be preconfigured at installation time, or it auto-tuned. But that's not the world we live in.

PMc · Mar 22, 2019

Here it is 2G ram, 3-4 pools, no graphics, and
Kernel-options:

Code:

options                KVA_PAGES=512
options        KSTACK_PAGES=8   # i've seen a "double fault" with 4

in loader.conf:

Code:

vm.kmem_size="1408M"
vm.kmem_size_max="1408M"
vfs.zfs.arc_max="800M"
vfs.zfs.arc_min="200M"
vfs.zfs.prefetch_disable="1"

in sysctl.conf

Code:

vfs.zfs.l2arc_norw="0"
vfs.zfs.l2arc_noprefetch="0"
vfs.zfs.arc_meta_limit=471859200  # arc is for metadata, payload goes in l2arc
vfs.zfs.min_auto_ashift=12

This is for the case scrub runs on SSD when booting - otherwise the machine will not reach multiuser. In multiuser these get reverted and auto-adjusted to current load.

Code:

vfs.zfs.scan_idle=86399999
vfs.zfs.scrub_delay=1

Beware: this is not thought for cut&paste. It is crafted for my specific need, e.g. the machine does some telephony routing and other housekeeping, like collecting backups - performance is not important, but responsiveness shall be maintained.

aragats · Mar 22, 2019

ILUXA said:
I started to have some problems with memory usage after update to 11.2...
... 12.0 works a little bit better, but anyway, 11.0 used to use memory much more reasonably for me.

I mostly agree with that. Now I'm on 12.0, and today I got this:

Code:

....
swap_pager_getswapspace(32): failed
pid 29726 (firefox), uid 1001, was killed: out of swap space
pid 38208 (chrome), uid 1001, was killed: out of swap space

Firefox had only one tab open, and Chromium about 15.
All other programs were urxvt terminals in DWM and 1 spreadsheet in LibreOffice.

I have 16G of RAM, 4G is for bhyve's Win2019, ARC is limited to 2G, and I have 10G of swap.
How that could happen? It's hard to reproduce it now...

D-FENS · Mar 22, 2019

ralphbsz said:
Agree: I run ZFS on a 32-bit machine with 3GB of memory, with 4 disks under ZFS control, 4 pools ranging in size from 1TB to 4TB. No memory problem at all, just have to set the well-known parameters in /boot/loader.conf:

Code:

vm.kmem_size="512M" vm.kmem_size_max="512M" vfs.zfs.arc_max="64M" vfs.zfs.vdev.cache.size="8M" vfs.zfs.prefetch_disable=0

Note that this a server machine, without X windows, without GUI, not running memory-intensive applications (like web browsers). I have not tuned the above settings for ideal performamce, since performance is good enough for my needs, and the time investment involved in tuning isn't worth it.

I agree it would be nice if ZFS memory usage could be preconfigured at installation time, or it auto-tuned. But that's not the world we live in.

How many snapshots do you have on this machine?
I ran into issues and my machine has 16 GB of ram and ~ 8000 snapshots so far. The disks are 4x 4TB.

ralphbsz · Mar 23, 2019

Snapshots? Fewer than fingers on one hand. I hardly ever use snapshots. I have a backup system that was built before I had a file system with snapshots available, so I don't use them for backups. Also, no compression or dedup (although the backup system has all that built in).

PMc · Mar 23, 2019

Alright, that now figures: snapshots are a problem. One can do the following: run send/receive on two pools of the local machine (or get the machine i/o bound by other means), then run some script that works with snapshots - and then the fun starts: performance degrades and programs are pushed into swap, for whatever reason.
It seems if ZFS cannot put the snapshot (creation or deletion) out to disk immediately, it starts to do ugly things.

One can do many interesting things with snapshots. For instance, I use them for port building: this is faster than doing make clean. So, one may have some script based stuff that does massive create/destroy actions following some logic. In that way, the snapshot becomes a programming function - and obviousely, then there can be lots of them.

D-FENS · Mar 23, 2019

PMc said:
Alright, that now figures: snapshots are a problem. One can do the following: run send/receive on two pools of the local machine (or get the machine i/o bound by other means), then run some script that works with snapshots - and then the fun starts: performance degrades and programs are pushed into swap, for whatever reason.
It seems if ZFS cannot put the snapshot (creation or deletion) out to disk immediately, it starts to do ugly things.

One can do many interesting things with snapshots. For instance, I use them for port building: this is faster than doing make clean. So, one may have some script based stuff that does massive create/destroy actions following some logic. In that way, the snapshot becomes a programming function - and obviousely, then there can be lots of them.

Yeah, I also think that has something to do with resource usage and massive snapshots seem to bloat the necessary cache. Eventually the system runs out of memory.
In my case it happens only when I do scrub. When not scrubbing, everything is peachy.

ralphbsz · Mar 23, 2019

Extra memory being used while scrubbing, enough for every scrub thread to keep a complete state of what it is looking at, makes sense. But after scrubbing is done, the memory usage should go back down. My machine often stays up for a month at a time (longer in the summer when there are fewer power outages), and it scrubs every 3 days, and memory usage doesn't creep up. Could this be a bug in scrubbing, with the way your ZFS is being used (snapshots and such)?

If yes, it's hard to report this; a developer would need way more debug information than just telling them "memory leaks".

PMc · Mar 23, 2019

It is most likely not a memory demand of scrubbing (I don't observe such a demand), it is rather the fact that the scrubbing floods the i/o queues. I see these similar effects without scrubbing, too, when the disk is busy enough.

Snapshots need to care about data integrity, while a filesystem is active. So, certain data has to be put to disk, and quickly so, or otherwise some of the regular ongoing i/o activity has to be held back for some time - and this context, although I do not know the implementation details, seems likely to create urgent memory demands.

Anyway, I cannot confirm a memory leak. I would say, it's a (temporary) overcommit. We should check if it can be contained by limiting the arc_max appropriately - and continue to complain

about what I would call a design weakness.

D-FENS · Mar 23, 2019

So how do I work around this problem with the I/O queue flooding? Reduce the amount of snaphosts?

PMc · Mar 23, 2019

Well, I don't know what is practical for You. And I do not yet see the full picture: is it the number of snapshots that is problematic, is it the frequency they get created/deleted, is it the size of the filesystems? Dunno.
Currently, I have remote disk mirroring active - that does use only a dozen snapshots, but they span all data and get created/deleted every few minutes. And the behaviour is far from troublesome, but still remarkably ugly.

For now, I would consider snapshots a valuable resouce that brings along certain expenses.

I am currently experimenting with limiting the arc_max, but then, my snapshots do reach only the hundreds at max, and I never had an actual out-of-memory situation. OTOH, I still have early-swapping enabled ( vm.swap_idle_enabled=1, from earlier times with scarce memory), and that makes the OS push data out to swap before memory gets low. That might also be worth trying - but I will switch it off now, I don't like the browser to first have to climb out of swap when I come back after an hour or so.

And then, there is something I don't understand (from top() output):

Code:

Mem: 963M Active, 353M Inact, 827M Laundry, 5385M Wired, 102M Buf, 317M Free
ARC: 3110M Total, 431M MFU, 2326M MRU, 3432K Anon, 102M Header, 253M Other
     2350M Compressed, 4676M Uncompressed, 1.99:1 Ratio

who is to be accounted for difference of 5385M - 3110M?

It currently appears to me that limiting the arc_max may not even tackle the actual issue, because memory gets claimed outside of the ARC (but still within wired ~~kernel~~ memory). This memory gets reclaimed when user processes are in need of mem, but only then.
It probably can be figured out who is doing that, but, uuh, that looks like work.

DoItDrive · Mar 29, 2019

Sounds like an old machine, but I'm guessing it isn't.

shkhln · Sep 16, 2019

JohnnySorocil said:
Maybe something in sys/swap_pager.c and/or sys/vm_pageout.c can be tuned?

For what it's worth, there are a few tunable parameters, but I have zero understanding how they work. I now have the following line in /etc/sysctl.conf:

Code:

# Fuck you, Firefox.
vm.disable_swapspace_pageouts=1

That indeed seems to make FreeBSD much more enthusiastic about murdering processes.