Intel N100 Alder Lake: random hangs plus reboots

Wow... that's too hot, I think, I don't like the sound of that. I wonder if they have used paste or a thermal pad. It might still benefit from improved thermal junction between heatsink and cpu. That's very hot. What happens if you point a fan at the top of the case?
Yes, that's what I thought, and I don't want it zapping the CPU due to insufficient thermal paste, so I will take the cautious approach of just observing if there are any more crashes, and there are none yet which is great.
 
Yes, that's what I thought, and I don't want it zapping the CPU due to insufficient thermal paste, so I will take the cautious approach of just observing if there are any more crashes, and there are none yet which is great.
I doubt it will break it because the chip will underclock itself when it gets too hot... of course the box might crash. From running that test on other N100's I would expect it to go up to around 80C and then sit there stable. If they have got the cooling right, which they probably have. It looks like a better quality build than the cheapest mini pc's. So the thermal paste or heat transfer pad or whatever they've used might well be fine. Anyway, it should certainly cope with that stress test if it's any good.
 
Of course you could always give it a helping hand to start off by blowing a hairdryer on cold setting at the heatsink on top of the case.
Lol you tempter...I ran that stress test and it ramped up from 36C to 62C and then after a minute stabilised at about 55C.
I expect it will gradually increase but I am quite impressed so far - it looks like they got the thermals right.
I checked and confirmed all cores are running at 100% in Task manager (XFCE util).
Ambient temperature in the room is 21C.

I stopped the stress test now and after a minute or so the CPU temp is showing 41C, so cooling is OK.

I also updated the microcode using the thread you posted above.

So far this evening there have been no more freezes, crashes or reboots, so I think your fix has nailed it, thanks a lot! :cool:
 
That doesn't sound bad at all. Stable at 55C sounds pretty good. I wouldn't worry about the heatsink compound on that basis, the heatsink and thermal design are clearly doing the job. I'm quite impressed by that result.

So, we're back to microcode update... or software. The $64000 question is, of course, can you reproduce the bug, or is it truly crashing at random.

Well, the microcode update is a known bug, so I guess you could just use it for a couple of days and see if it crashes again. Of course it's much better if you can reliably recreate the bug and prove that you've fixed it with a test case.
 
That doesn't sound bad at all. Stable at 55C sounds pretty good. I wouldn't worry about the heatsink compound on that basis, the heatsink and thermal design is clearly doing the job. I'm quite impressed by that result.

So, we're back to microcode update... or software.
Yes I think those are very impressive numbers and so far, touch wood, no more crashes!

I edited the post after you read it - microcode update also done.
 
If it's using DDR5: run a memtest.

It seems DDR5 is *extremely* picky and fragile. Pretty much since the early days of DDR1/2 RAM I've never had to fiddle around with different memory modules and vendors as much as with DDR5. And don't get me wrong - I'm not using some weird off-brand crap. It's usually either micron or kingston.
For those Nxxx platforms I therefore always went for the DDR4 variants if available - for one they will pretty much accept any module that's in spec (as they should) and secondly DDR4 is available much cheaper than DDR5. I couldn't care less about the (theoretical) performance penalty - those N1xx/2xx single-board PCs are pretty much bottom-end performance anyways, so I use them accordingly.
 
Perhaps when you ran linux on that box it was set up to put the microcode update on by default, but not being done on freebsd until it was installed. Just guessing.
Yes that is possible. To be honest I think the first thing I did to eliminate the hangs & crashes was adding that 'nuclear_page' line into /boot/loader.conf, as the system never crashed immediately after that line was added.

However, the firmware update should definitely bring more stability.
 
If it's using DDR5: run a memtest.

It seems DDR5 is *extremely* picky and fragile. Pretty much since the early days of DDR1/2 RAM I've never had to fiddle around with different memory modules and vendors as much as with DDR5. And don't get me wrong - I'm not using some weird off-brand crap. It's usually either micron or kingston.
For those Nxxx platforms I therefore always went for the DDR4 variants if available - for one they will pretty much accept any module that's in spec (as they should) and secondly DDR4 is available much cheaper than DDR5. I couldn't care less about the (theoretical) performance penalty - those N1xx/2xx single-board PCs are pretty much bottom-end performance anyways, so I use them accordingly.
According to the manufacturer's website the RAM is DDR4 so that avoids any DDR5 shenanigans.

One of my other systems uses DDR5 and whilst stable now, when I had old mobo BIOS installed, there were certainly some oddities occurring.
 
This was the line that brought immediate stability, I haven't read what it does yet but it seems to have worked. Thanks so much! How/where did you find this fix?
I look through options for stuff to tweak with my GPU and normally set that to 1 thinking it might do something interesting :p

I saw these lines in an above log and it sounded like that option could be related (I saw nuclear mentioned with atomic somewhere a while back):

Code:
#10 0xffffffff83843dc6 at intel_atomic_commit+0xd6
#11 0xffffffff83a19719 at drm_atomic_helper_page_flip+0x59
#12 0xffffffff83a4ad51 at drm_mode_page_flip_ioctl+0x421
 
I look through options for stuff to tweak with my GPU and normally set that to 1 thinking it might do something interesting :p

I saw these lines in an above log and it sounded like that option could be related (I saw nuclear mentioned with atomic somewhere a while back):

Code:
#10 0xffffffff83843dc6 at intel_atomic_commit+0xd6
#11 0xffffffff83a19719 at drm_atomic_helper_page_flip+0x59
#12 0xffffffff83a4ad51 at drm_mode_page_flip_ioctl+0x421
Well it sorted this issue out, so thanks a lot.
 
Hmm, nuclear_pageflip is kind of interesting. I think 'nuclear' here is a reference to 'atomic' 😁
It looks like there have been some recent changes in this area: https://lists.freedesktop.org/archives/intel-gfx/2023-October/338230.html
Maybe something got broken. Or maybe it ended up being enabled by default on hardware that doesn't want it enabled.
Setting 'hw.i915kms.nuclear_pageflip=0' disables it.

Here is what the doc says it does-
nuclear_pageflip:Force enable atomic functionality on platforms that don't have full support yet. (bool)

So its to force atomic page flipping on hardware that doesn't support it yet, most likely by implementing it in software; I didn't go and trawl through the code to see exactly what it does. Since the N100 is a fairly recent chip, and nuclear pageflip was introduced around 2015, it seems reasonable to suppose that the N100 will support it and needs the option switched off.

Background info here: https://www.x.org/wiki/Events/XDC2012/XDC2012AbstractRobClark-KMS/xdc2012-atomic-pagefilp.pdf
and here https://lwn.net/Articles/569680/ . It's been around a while.

On my X220 running freebsd 14.3R it's set to zero by default:-
# sysctl hw.i915kms.nuclear_pageflip
hw.i915kms.nuclear_pageflip: 0

So I'm a bit surprised it wouldn't be 0 by default on an N100. Unfortunately I don't have one running freebsd at the moment so I can't check.
 
Back
Top