KDE crashing on 2GB RAM, kernel doesn't handle OOM right

It's a bare metal FreeBSD 15 desktop with 2GB RAM and a dual core overclocked to 3.05GHz.
Running KDE and Firefox starts putting RAM under pressure, obviously. KDE starts flickering, taskbar disappearing/restarting, start menu not drawing correctly, flickering. Then KDE completely crashes to terminal oftentimes.

This is seen on both UFS and ZFS, although much worse on ZFS.

Setting vm.pageout_oom_seq to higher value than default (from 12 to 1200) actually improves things on UFS to the point of usability, it seems, but does not work much on ZFS. I also tried to lower ZFS ARC memory limit to 500MB or to 1500MB but seeing the same problems.

Interestingly, at the default vm.pageout_oom_seq of 12 swap is NOT used at all when problems happen. At a higher setting of 120 and 1200 swap does start get to be used, but swap is used to some small percent of maybe 10-12%, and the problems begin even when there's a lot of swap space available.

It seems the kernel is not handling OOM conditions properly. There's some major screwing up going on that can crash stuff. It's consistently reproducible on this older desktop - and the same setup works great on my 32GB RAM machine (that's usually not utilizing all of RAM for now). This is not reliable! There's gotta be a way to never kill any memory when there's a lot of unused swap space available or something.
 
The Supreme Being ("AI") has some interesting ideas about this, can I please run this by the experts?

KDE + Firefox creates a scenario where memory pressure spikes faster than the default paging daemon (pagedaemon) can react.

Is the solution to this to increase the pageout wakeup threshold so that the pagedaemon starts looking for pages to free when there's still a decent amount of free RAM left? And to increase minimum free RAM? Maybe KDE needs more free physical memory available when the KDE compositor asks for wired RAM for texture buffers? It seems that GPU is denied new allocations when we're running at, like, 30MB of free RAM available.

vm.v_free_min=50000?
vm.v_free_target=200000?
vm.v_free_reserved=100000?
 
Do you have to run KDE on a machine with 2 GB RAM?
I have to run anything on anything to know what the specific limits are. I'm worried I will run into problems when I run into memory pressure problems on 32GB of RAM and crash something. And I read some anecdotal evidence online already that something similar might happen on bigger RAM machines.

I ran Gnome with Linux on that machine without problems, so I am fleshing out what FreeBSD does/does not do to make it work.
 
I have to run anything on anything to know what the specific limits are. I'm worried I will run into problems when I run into memory pressure problems on 32GB of RAM and crash something. And I read some anecdotal evidence online already that something similar might happen on bigger RAM machines.

I ran Gnome with Linux on that machine without problems, so I am fleshing out what FreeBSD does/does not do to make it work.

This doesn't scale. Your specific problem probably is that the kernel itself is too tight. On the 32 GB machine that wouldn't happen.

The Linux situation is more complex. i inn general Linux it's a even more trigger happy with OOM killing. But i speculate that is is paging more kernel memory.

If you want to know more we need the dmesg messages from FreeBSD from around the kill.
 
If you want to know more we need the dmesg messages from FreeBSD from around the kill.
Ok, I'm happy to try to produce more info on this. I'm playing with tunables right now, and I'm also looking in BIOS settings...the graphics memory was set to Maximum available...so I'm changing that right now to see if that was an issue.
 
Here's the dmesg when the entire KDE crashes, cracauer@ . It crashes usually when I try to do something with KDE - like open some menu. Preceding the crash, there's weird flickering/nondrawing of the start menu/taskbar menus or drawing of those with artifacts for a little while. It almost feels there's some sort of the shared GPU memory issue. Shared GPU memory should be wired, maybe it doesn't get the wired status? Or maybe KDE GUI must get wired status? Can I maybe do protect for plasma-something process? But the fact that it's shared memory GPU may be causing some issues?

Code:
drmn0: [drm] GPU HANG: ecode 4:1:f7ffffef, in QSGRenderThread [101159]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] QSGRenderThread[101159] context reset due to GPU hang
pid 5659 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 6430 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 3850 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 6327 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 1870 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 2320 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 3732 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 5628 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 1031 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 13040 (xdg-desktop-portal-), jid 0, uid 1001: exited on signal 6 (core dumped)
Dec 29 20:20:01 freebsddesk pulseaudio[33068]: [] core-util.c: Failed to create secure directory (/var/run/user/1001/pulse): No such file or directory
drmn0: [drm] GPU HANG: ecode 4:1:85fffffd, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
Dec 29 21:06:44 freebsddesk dbus-daemon[75022]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.63" (uid=1001 pid=16984 comm="") interface="org.freedesktop.ConsoleKit.Seat" member="Inhibit" error name="(unset)" requested_reply="0" destination="org.freedesktop.ConsoleKit" (uid=0 pid=22559 comm="")
Dec 29 21:06:49 freebsddesk kscreenlocker_greet[65139]: in _pam_exec(): pam_sm_setcred: pam_get_authtok(): authentication token not available
drmn0: [drm] GPU HANG: ecode 4:1:9ffdfeff, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
drmn0: [drm] GPU HANG: ecode 4:1:87effffd, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
pid 24192 (plasmashell), jid 0, uid 1001: exited on signal 6 (core dumped)
 
Here's the dmesg when the entire KDE crashes. It crashes usually when I try to do something with KDE - like open some menu. Preceding the crash, there's weird flickering/nondrawing of the start menu/taskbar menus or drawing of those with artifacts for a little while. It almost feels there's some sort of the shared GPU memory issue. Shared GPU memory should be wired, maybe it doesn't get the wired status? Or maybe KDE GUI must get wired status? Can I maybe do protect for plasma-something process? But the fact that it's shared memory GPU may be causing some issues?

Code:
drmn0: [drm] GPU HANG: ecode 4:1:f7ffffef, in QSGRenderThread [101159]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] QSGRenderThread[101159] context reset due to GPU hang
pid 5659 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 6430 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 3850 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 6327 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 1870 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 2320 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 3732 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 5628 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 1031 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 13040 (xdg-desktop-portal-), jid 0, uid 1001: exited on signal 6 (core dumped)
Dec 29 20:20:01 freebsddesk pulseaudio[33068]: [] core-util.c: Failed to create secure directory (/var/run/user/1001/pulse): No such file or directory
drmn0: [drm] GPU HANG: ecode 4:1:85fffffd, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
Dec 29 21:06:44 freebsddesk dbus-daemon[75022]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.63" (uid=1001 pid=16984 comm="") interface="org.freedesktop.ConsoleKit.Seat" member="Inhibit" error name="(unset)" requested_reply="0" destination="org.freedesktop.ConsoleKit" (uid=0 pid=22559 comm="")
Dec 29 21:06:49 freebsddesk kscreenlocker_greet[65139]: in _pam_exec(): pam_sm_setcred: pam_get_authtok(): authentication token not available
drmn0: [drm] GPU HANG: ecode 4:1:9ffdfeff, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
drmn0: [drm] GPU HANG: ecode 4:1:87effffd, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
pid 24192 (plasmashell), jid 0, uid 1001: exited on signal 6 (core dumped)
And I *was* stress testing it a little bit by opening many KDE apps, 5 Firefox tabs, and clicking all all the KDE menus.
 
Hmm, that looks more like firefox crashes on its own. I don't see oom kills.
I don't know what's going on. I tried launching plasmashell with mlock - to make sure plasmashell launches as wired memory and would never be killed or swapped out (I compiled a .so and prepended it to LD_LIBRARY and can see plasmashell hog up some MBs in wired memory when launched). But plasmashell still crashed in the same way - which is now a little weird because it was running in wired memory, and it's unclear why plasmashell would hang GPU.
 
I think firefox is the problem, not kde. Firefox is a huge memory hog. Does it happen if you try perhaps waterfox? Or something not based on firefox at all? Chromium? Browsers are such bloatware... I bet it doesn't crash if you use 'links' !

You can disable some of the stuff in kde that you're not using too.

If you are hoping that the oom killer will choose the correct process to kill... well, ymmv. Oom killer is a last-ditch 'try to stop myself falling over' measure, not a way to keep a system up and running reliably. An oom is always an error condition that should be rectified such that it can never occur, it means the box is running outside of its 'safe area of operation', like an assert firing in code, or a segfault.
 
What happens when you use a different browser?
I kind of want to stick with firefox because I have all the preferences tuned for in in user.js...and also I guess it works on another machine.

or if you don't overclock?
I will try that, that's a good idea, thanks.

Firefox is a huge memory hog
So? I have 12GB of swap, and it's barely used through all of this.
An oom is always an error condition that should be rectified such that it can never occur
100%, so I'm not sure why pageout_oom_seq is a tunable that helps when swap is barely used.
You can disable some of the stuff in kde that you're not using too.
I disabled some stuff.
If you are hoping that the oom killer will choose the correct process to kill... well, ymmv.
The good thing should be that it only kills stuff that's not in the wired memory, right? I guess that's not what's happening because GPU gets hung.
You could try re-seating the dimms
Ah, I actually ended up doing that earlier today!
I mean... is this any surprise?
Yes because something that works should work on more restricted resources, maybe just be sluggish and slow. That's what I'm saying, need to know the minimum requirements, like, set a cutoff, this isn't a guessing game, like what the heck.
 
Here's the dmesg when the entire KDE crashes, cracauer@ . It crashes usually when I try to do something with KDE - like open some menu. Preceding the crash, there's weird flickering/nondrawing of the start menu/taskbar menus or drawing of those with artifacts for a little while. It almost feels there's some sort of the shared GPU memory issue. Shared GPU memory should be wired, maybe it doesn't get the wired status? Or maybe KDE GUI must get wired status? Can I maybe do protect for plasma-something process? But the fact that it's shared memory GPU may be causing some issues?

Code:
drmn0: [drm] GPU HANG: ecode 4:1:f7ffffef, in QSGRenderThread [101159]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] QSGRenderThread[101159] context reset due to GPU hang
pid 5659 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 6430 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 3850 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 6327 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 1870 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 2320 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 3732 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 5628 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 1031 (firefox), jid 1, uid 1001: exited on signal 11 (no core dump - coredumpsize limit is 0)
pid 13040 (xdg-desktop-portal-), jid 0, uid 1001: exited on signal 6 (core dumped)
Dec 29 20:20:01 freebsddesk pulseaudio[33068]: [] core-util.c: Failed to create secure directory (/var/run/user/1001/pulse): No such file or directory
drmn0: [drm] GPU HANG: ecode 4:1:85fffffd, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
Dec 29 21:06:44 freebsddesk dbus-daemon[75022]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.63" (uid=1001 pid=16984 comm="") interface="org.freedesktop.ConsoleKit.Seat" member="Inhibit" error name="(unset)" requested_reply="0" destination="org.freedesktop.ConsoleKit" (uid=0 pid=22559 comm="")
Dec 29 21:06:49 freebsddesk kscreenlocker_greet[65139]: in _pam_exec(): pam_sm_setcred: pam_get_authtok(): authentication token not available
drmn0: [drm] GPU HANG: ecode 4:1:9ffdfeff, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
drmn0: [drm] GPU HANG: ecode 4:1:87effffd, in plasmashell [100903]
drmn0: [drm] Resetting chip for stopped heartbeat on rcs0
drmn0: [drm] plasmashell[100903] context reset due to GPU hang
pid 24192 (plasmashell), jid 0, uid 1001: exited on signal 6 (core dumped)
This is mostly about a gpu hang and reset issue. Nothing pointing to a memory issue.
 
It can run long if I just use Firefox, well, longer that I haven't even seen it crash yet, but the moment I start opening KDE menus and apps and stuff, it starts faltering.
That sounds more like a gpu bug, as monwarez said from his log analysis. Can you switch off all the fancy plasma graphics effects, or maybe you've already tried that? Does it happen if firefox isn't running at all and you open kde menus? Or if you disable the 'usual gpu performance enhancemments' in firefox?

Or maybe try a different graphics adapter in the machine?
 
... or if you don't overclock?
Well, this is so bizarre now. I changed bus speed back to baseline, but CPU-x reports I'm still at a higher clock with some weird fractional multiplier. dev.cpu.0.freq_levels now gives me these weird levels 2603 2003 1603 1203...whereas they should all be round numbers (like 2600 2000 1600 1200)? This could creep all sorts of corruption into the hard drives from what I read, which is what I could be seeing. But why does FreeBSD not go back to the original frequency of the CPU after I stop overclocking? I am so confused, I thought it'd just go back to whatever CPU is reporting. 📣🔊🔔🔔🔔
 
Well, this is so bizarre now. I changed bus speed back to baseline, but CPU-x reports I'm still at a higher clock with some weird fractional multiplier. dev.cpu.0.freq_levels now gives me these weird levels 2603 2003 1603 1203...whereas they should all be round numbers (like 2600 2000 1600 1200)? This could creep all sorts of corruption into the hard drives from what I read, which is what I could be seeing. But why does FreeBSD not go back to the original frequency of the CPU after I stop overclocking? I am so confused, I thought it'd just go back to whatever CPU is reporting. 📣🔊🔔🔔🔔
And
sysctl machdep.tsc_freq reports old overclocked frequencies and just weird frequencies (like it reports 2.611Ghz for freq level of 1203.
 
From: freebsd-announce@FreeBSD.org
Re: FreeBSD overclocking no-no
>> Attention, attention. Do NOT use KDE if you overclocked your machine. Repeat, do NOT use KDE with an overclocked FreeBSD.
>> Alert, alert. Please delete your FreeBSD installation if you want to go back to baseline CPU frequencies after you overclocked your CPU. Repeat, do not, just do not, ok?
 
Back
Top