Post the conditions under which your kernel crashes and why ?

About the poudriere crash issue: My educated guess is that this is not a kernel or software problem, but a hardware problem. For example, the RAM in the machine might be marginal, and it is not ECC. Under normal light usage, it works fine. Under extreme load, it overheats, or it uses too much power and voltages drop, or errors that are normally rare start piling up. Eventually, memory errors corrupt kernel data structures, and the kernel can't do anything other than crash. Why are kernel data structures such a good target? Because all free memory is usually used for file system buffer cache, and the data structures that describe that buffer cache are the biggest target for memory corruption.

In the mid 90s, lots of people built cheap i386 and i486 systems and ran Linux of them. There was a whole cottage industry of white-box computer assemblers. Many of these systems were cheaply built, with little quality control. Most ran just fine under Windows 3.1 or DOS, but had problems under the more intense workloads that Linux could put on them. Memory tests typically didn't find the problem (they were too simple minded, and didn't stress the rest of the system like disks, which use power too), so the best stress test for the system was doing Linux kernel compiles. I used to run them overnight on my system, and if it survived from midnight to 7am without a kernel crash, it was good enough to use in production.

This is one of the reasons I swear by ECC memory ... except this is a case of "do as I say, not as I do": My little server at home does not have ECC. Shame on me.
 
If poudriere is making the kernel crash, this sounds like someone needs to file a PR. But no one has, as far as I know, which makes me suspicious as to where the problem actually lies.

Fiddling with speed settings to make a system stable tells me the system isn't stable which has nothing to do with FreeBSD. What were the settings beforehand? Were they changed and then the system started crashing? And now one is blaming FreeBSD's kernel?

Inquiring minds want to know. Or not. Fiddling with clock timing requires informed minds. When I designed motherboards, a lot of thought and testing was put into what would work reliably and you didn't touch what I put in there. Any changes were just guesswork on the part of the user (not that you could cause it wasn't an option).
I used the stock settings (without XMP enabled). And the system kept crashing. Disabled the E-cores (alderlake). Still crashing. Asus is by default enabling "Intel adaptive boost" which is not supported by my cpu. Disabled that. XMP unstable even with 2t enabled. Most of the time without using any sort of FreeBSD power saving fetaures (if enbled then it crashed again). But setting the ram speed manually with 2t enabled it seems much more stable. But only time will tell. But the main issue is there is more than one problem at a time. And a part of that is FreeBSDs not complete support of alderlake at this time.
 
Hi again

I did check the ram modules. It seems the firm I where I ordered the computer has taken the chance to take two separate ram kits, total four slots, and not a kit with four modules together. I looked at the manufactures homepage. I found what looks like the same kit with four modules running at 2666mhz and 1.2 V, instead of the 3200mhz 1,35 V kit.

So I have the choice of running with only two modules or four modules with lower speed. I am trying out the latter firstusing manual settings. Lets see how it goes ;)
 
I found what looks like the same kit with four modules running at 2666mhz and 1.2 V, instead of the 3200mhz 1,35 V kit.
And this is the kind of real-world problem that causes crashes. Much more common that software bugs in the kernel.

Anecdote: My first "IBM PC" at home (meaning x86 architecture machine) was in about 1992, and was an AMD 386-40. So a full 32-bit machine, with 4 M of RAM, and running very fast: At the time Intel only had CPUs up to 33 MHz, and by going with AMD instead, I got a significant speed boost at no additional cost. A while later (see below), I needed to add an FPU, both because I was doing floating-point intensive physics calculations, and because I was using X Windows, which does font rendering using floating point. The problem was that getting FPUs in 40 MHz was really hard; there was only one manufacturer (Cyrix), and they were rarely used, since most people using computers in that price range (a few thousand $) didn't need numeric performance.

This was before the internet (instead, local stores had phonebook-sized catalogs), and I spent a few afternoons going from computer store to computer store, and nobody had the Cyrix 387-40 in stock. Eventually I ended up at a small Chinese-owned computer store in a back alley of Palo Alto. When I asked about that FPU, the owner (and only employee) opened his desk drawer, and pulled out a chip: No antistatic protection, pins a little bit bent, but it did say that it was a Cyrix 387-40 on it. I asked how much it would be, and he didn't really know, so I handed him $20 in cash, and he was happy. My thinking was: This chip is very unlikely to work, given that it has been stored in a desk drawer, but it's cheap.

Turned out it worked perfectly. That machine continued functioning for about 6 or 7 years, and was my main workhorse at home. After a while, I added a second computer (a 486-25), so both my wife and me could use X at the same time (before that, one of us had to log in via a serial-connected VT200 terminal and work in text mode).

The other funny anecdote was the OS I ran on it. Initially, I had wanted to run BSD, but it hadn't been ported to 386 yet (Bill Jolitz' 386BSD wasn't available yet, it came a year later). But BSDi was selling a version of BSD that ran on 386 at the time, for about $1000. The only problem was: I really needed graphics (for data analysis). And while BSDi was working on porting X Windows to their OS, they only supported one video card (the Tseng ET4000) which was de-facto unavailable; and their X Windows port was known to be so broken that you could de-facto not run it. I didn't feel like spending several hundred $ on a rare video card plus $1000 on the OS, just to end up with a train wreck. But then, this crazy toy OS named "Linux" became available, and while it had no graphics yet, at least CLI mode was rumored to work, and it had a working C compiler (Fortran I could do myself, by using AT&T's f2c). So I downloaded the ~30 floppies of the SLS or Slackware distribution, and got it to work. When I bought the second computer, I initially connected them via PLIP (special cable to go printer port to printer port), since ISA bus Ethernet cards were still very expensive. And a few months later, the X windows port on Linux started working, and by 1994 Linux was widely supported in the science community, and the rest is history. I only went back to the *BSDs in the early 2000s, when I decided that Linux was just too sloppy and insecure for a dedicated firewall machine.
 
I had crashes after installing an experimental SD card reader driver. After a few iterations - it stopped crashing - and eventually became part of the 13.0-RELEASE.
 
My trash laptop with a celeron and 4gb of ram ran out of memory when compiling stuff and froze, had to restart it.
 
complete support of alderlake
Can second that. For all the stable years before it, switching to Alderlake has brought with it a forgotten instability and constant fear of crashes. I have the same experience on OpenBSD though - so I don't think it's purely a FreeBSD thing.
A lot of the crashes I'm seeing recently are due to inteldrm, so I'm not holding my breath that this is going to be a smooth ride any time soon on FreeBSD either.

Intel's recent introduction of performance and efficiency cores have brought a new bag of computer science problems, which will take a few years to get settled - i.e. what load to run on which cores under which circumstances and how to switch between them whenever factors shift.

So I expect things are going to get worse before they get any better.
 
So I expect things are going to get worse before they get any better.
Exactly the reason why I decided to purchase a new i7 11th gen laptop last year. This way I still got a reasonable hardware upgrade but also don't have to deal with all this pain until somebody else fixed it for us :p
 
Back
Top