The computer keeps crashing during the compilation process of chromium-123.0.6312.58_1, but never crashed when I was building version 123.0.6312.58!

Oleg_NYC · Mar 30, 2024

astyle said:
I think you can safely ignore those RAM errors - unless you enjoy hunting them down.

Wait, you think memtest86+ didn't find anything that would cause real problems? Why are you so sure of it?

astyle · Mar 30, 2024

Oleg_NYC said:
Wait, you think memtest86+ didn't find anything that would cause real problems? Why are you so sure of it?

Well, the first pass did not find a problem period.

Second - you stated that just telling the CPU to not operate above 3.7 GHz made crashes go away. That alone eliminated RAM as a potential problem area.

Third, you did mention that you have 64GB of RAM - that's frankly adequate amount for a monster compilation like Chromium. And RAM is one of those things where available amount of gigabytes matters more than a corrupted area somewhere.

Remember what RAM stands for - Random Access Memory. And it's dynamically managed - if one area of RAM is problematic, the OS will know to stay away from it. If RAM were in fact problematic, you'd find out way sooner than from testing or a compilation crash. And it's faster to just replace RAM with a brand-new stick than to hunt down the exact address of RAM corruption. :/

cracauer@ · Mar 31, 2024

Oleg_NYC , may I ask what your objective is now?

You have a too small CPU cooler. Is it a consideration at all to get a proper one?

You will spend months tinkering in the BIOS and verifying what's stable. Unless there is a direct power limit setting, but I think that is newer generations only.

Cath O'Deray · Mar 31, 2024

cracauer@ said:
… what your objective is now? …

Oleg_NYC said:
… I solved the problem of not being able to save core.txt files …

Oleg_NYC said:
… There were no more panics after I disabled Turbo Boost and "Overclock TVB" in the UEFI. …

Oleg_NYC said:
… My goal is to simply figure out what caused crashes while I was compiling stuff.

Oleg_NYC said:
… I am still not entirely sure I found out the cause of these crashes. …

Re-enable Turbo Boost and Overclock TVB
push the computer as hard as possible until the kernel panics
interpret the resulting core.txt.⋯ file.

If you can not reproduce a panic after step (1), then again broaden your thinking (and know that memory errors may occur).

Oleg_NYC · Mar 31, 2024

cracauer@ said:
Oleg_NYC , may I ask what your objective is now?

You have a too small CPU cooler. Is it a consideration at all to get a proper one?

You will spend months tinkering in the BIOS and verifying what's stable. Unless there is a direct power limit setting, but I think that is newer generations only.

I am a computer n00b; that's why it's hard for me to figure out what I want... But can you please explain to me how this cooler is capable of keeping the cores' temperatures below 65 degrees no matter how intensive the process of compiling software becomes? I am talking specifically about the temperatures that were observed when the clock frequency was set to 3.7Ghz. I know that you guys implied this could be due to thermal throttling, but you also implied that even throttle_log can't tell me whether thermal throttling actually took place... I don't want to unscrew screws attached to the cooler just to find out its exact model name. I am just too afraid of damaging something... And why are you saying I will spend months verifying what is stable? It's very likely I already know what the stable solution is: not raising the clock frequency above 3.7 Ghz. As I said before, I have never crashed again after disabling Turbo Boost in the UEFI. I wonder why such an evil feature was enabled by default in the UEFI.

Cath O'Deray · Mar 31, 2024

Oleg_NYC said:
… Turbo Boost … evil …

From What Is Intel® Turbo Boost Technology? - Intel, with added emphasis:

"… potentially increase CPU speeds up to the Max Turbo Frequency while staying within safe temperature and power limits. …"

astyle · Mar 31, 2024

Oleg_NYC said:
But can you please explain to me how this cooler is capable of keeping the cores' temperatures below 65 degrees no matter how intensive the process of compiling software becomes?

I pointed out earlier in the thread - even an underpowered cooler is better than nothing. And, that cooler works in tandem with thermal throttling.

Oleg_NYC said:
I don't want to unscrew screws attached to the cooler just to find out its exact model name. I am just too afraid of damaging something...

Sounds like you have an OEM cooler that came with the entire computer. Those coolers tend to be underpowered, esp. if the OEM machine has a nice processor. If you know the brand/model name of your machine, it's pretty easy to figure out the specs of the cooler used by the OEM.

Oleg_NYC said:
And why are you saying I will spend months verifying what is stable? It's very likely I already know what the stable solution is: not raising the clock frequency above 3.7 Ghz

We talked about thermal throttling and looking at RAM errors. If you want to take your own measurements, then yeah, it will take months to learn how to do them correctly, and how to interpret the results you get. Or, you can go to the Forums, ask around, and figure out within hours (if not days) that all it takes to avoid crashes is to avoid Turbo Boost on your processor. Somebody on the Forums will point out an obscure technical bit of info that you can play with and see if it helps. That's what the Forums are for...

And BTW: That kind of valuable, but obscure information is called Easter Egg... because it takes some determined research to find

Oleg_NYC said:
As I said before, I have never crashed again after disabling Turbo Boost in the UEFI. I wonder why such an evil feature was enabled by default in the UEFI.

Sounds like a stupid decision by the computer's OEM. Well, once you discover that the processor gets too hot, and start looking for solutions, that is a valuable learning experience.

Crivens · Mar 31, 2024

astyle said:
the OS will know to stay away from it

Nope. How would it?
And these random bit errors is why you use EEC memory. You may end up with a corrupted file system if the write cache gets hit.

astyle · Mar 31, 2024

Crivens said:
Nope. How would it?

Read a textbook by AST (Andrew Tannenbaum) sometime. The kernel does have data structures that keep track of available addresses of RAM. And yes, that is in addition to keeping track of addresses of the HDD/SSD. If an address is corrupted, it will return incorrect data. Considering the kernel needs the returned data to be correct, it will try to relaunch the process to a different area of the RAM. That's what I mean by 'OS will know to stay away from bad spots on the RAM'.

Crivens · Mar 31, 2024

astyle said:
Read a textbook by AST (Andrew Tannenbaum) sometime. The kernel does have data structures that keep track of available addresses of RAM. And yes, that is in addition to keeping track of addresses of the HDD/SSD. If an address is corrupted, it will return incorrect data. Considering the kernel needs the returned data to be correct, it will try to relaunch the process to a different area of the RAM. That's what I mean by 'OS will know to stay away from bad spots on the RAM'.

And how would it know the data is bad? It does not checksum all memory pages.

T-Aoki · Mar 31, 2024

Crivens said:
And how would it know the data is bad? It does not checksum all memory pages.

So it would malfunction or crash without alerts. If there's ECC or something alike, it would be able to keep on running, alerting for error and suggest replacing.

Oleg_NYC · Mar 31, 2024

tingo said:
Pass is the number of passes (rounds) that have run. One pass includes all the tests selected.
Pass 2, Test 6: one error detected
Pass 3, Test 6, one error detected.
Pass 4 is still ongoing in the picture you posted, hard to say if it will detect any errors in the future.

Wait, I just noticed a contradiction in your text. You said "have run", as in finished running. But then you said Pass 4 "is still ongoing". If, for example, the first round of tests were still in progress, memtest86+ would print "Pass: ". There would be no number next to it. But the photo that I posted shows "Pass: 4". It means Pass 4 is not still ongoing, but has concluded, and now the fifth round of the same tests is in progress. If "Pass 2" was written in the description of one error and "Pass 3" was written in the description of the other error, that means the first error was found before the second round of tests concluded, and the second error was found before the third round of the same tests concluded. The fourth round of the same tests didn't find any errors at all. The photo seems to suggest round 5 is in progress. But there is no such thing as "Pass 0", right? If an error had been found during the first round of tests, it wouldn't have printed "Pass 0" in the description of the error?

cracauer@ · Mar 31, 2024

grahamperrin said:
Re-enable Turbo Boost and Overclock TVB

push the computer as hard as possible until the kernel panics

interpret the resulting core.txt.⋯ file.

If you can not reproduce a panic after step (1), then again broaden your thinking (and know that memory errors may occur).

The panics are not always the same, I think that was established on page 1 or 2.

As far as the memory errors are concerned, they aren't necessarily caused by bad RAM cells. I think it is more likely the CPU flipping some bits. Again, a strong hint here is that the specific errors vary.

cracauer@ · Mar 31, 2024

Oleg_NYC said:
I am a computer n00b; that's why it's hard for me to figure out what I want... But can you please explain to me how this cooler is capable of keeping the cores' temperatures below 65 degrees no matter how intensive the process of compiling software becomes? I am talking specifically about the temperatures that were observed when the clock frequency was set to 3.7Ghz. I know that you guys implied this could be due to thermal throttling, but you also implied that even throttle_log can't tell me whether thermal throttling actually took place... I don't want to unscrew screws attached to the cooler just to find out its exact model name. I am just too afraid of damaging something... And why are you saying I will spend months verifying what is stable? It's very likely I already know what the stable solution is: not raising the clock frequency above 3.7 Ghz. As I said before, I have never crashed again after disabling Turbo Boost in the UEFI. I wonder why such an evil feature was enabled by default in the UEFI.

The thermal throttling is obviously not working as advertised, otherwise you wouldn't have errors. That is why it is kind of hopeless to chase a stable underclock, which effectively is what you do. The temperature in your room goes up a degree, and you are unstable again.

Also, I think your method of determining current CPU frequency is sub-standard. You should really be looking at sysutils/i7z.

I once again urge you to just slap a real CPU cooler on there. If I might give a piece of advise I usually reserve for ziomario - you are spending a lot of time and energy on knowledge that you will not re-use. If your spent it on reusable knowledge your will be far better off in the future.

astyle · Mar 31, 2024

Oleg_NYC said:
Wait, I just noticed a contradiction in your text. You said "have run", as in finished running. But then you said Pass 4 "is still ongoing". If, for example, the first round of tests were still in progress, memtest86+ would print "Pass: ". There would be no number next to it. But the photo that I posted shows "Pass: 4". It means Pass 4 is not still ongoing, but has concluded, and now the fifth round of the same tests is in progress. If "Pass 2" was written in the description of one error and "Pass 3" was written in the description of the other error, that means the first error was found before the second round of tests concluded, and the second error was found before the third round of the same tests concluded. The fourth round of the same tests didn't find any errors at all. The photo seems to suggest round 5 is in progress. But there is no such thing as "Pass 0", right? If an error had been found during the first round of tests, it wouldn't have printed "Pass 0" in the description of the error?

Uhhh... in your photo, "Pass" means "Round of testing". You "Pass" through a round of testing like you'd "Pass" through a street. In your photo, pass 4 (not 5) is still going - meaning the RAM was scanned several times over, and is now being scanned for the 4th time.

Oleg_NYC · Mar 31, 2024

cracauer@ said:
The panics are not always the same, I think that was established on page 1 or 2.

As I said, those were either "double fault", "privileged instruction fault", or "page fault" panics... Are you implying that it would be wrong for me to assume these panics happened due to the fact Turbo Boost was enabled? But I keep saying that after I disabled Turbo Boost, the computer hasn't crashed even once!

grahamperrin said:
Re-enable Turbo Boost

No, I want to stay away from this evil feature.

cracauer@ said:
The thermal throttling is obviously not working as advertised, otherwise you wouldn't have errors.

So, you are saying that if I had a more powerful cooler, I wouldn't experience panics even with Turbo Boost enabled? But someone else said that the thermal throttling mechanism simply gets shut off if the computer is operating at Turbo Boost frequencies. I guess this doesn't matter if a powerful cooler is capable of keeping temperatures low even when Turbo Boost frequencies are utilized.

cracauer@ · Mar 31, 2024

Oleg_NYC said:
So, you are saying that if I had a more powerful cooler, I wouldn't experience panics even with Turbo Boost enabled? But someone else said that the thermal throttling mechanism simply gets shut off if the computer is operating at Turbo Boost frequencies. I guess this doesn't matter if a powerful cooler is capable of keeping temperatures low even when Turbo Boost frequencies are utilized.

You have general instability, with no exact reproducibility. It is not possible to predict with a guarantee what will make it go away. But given that you have a CPU cooler too small for your high-end CPU that is clearly the first thing to try.

I don't believe that those assumptions about thermal throttling and turbo boost are correct. But I am not that familiar with 11th gen intel. Either way, of course your computer is supposed to be 100% stable with default settings, turbo boost on (but no manual overclocking). It would also be irrational to first spend the money on a i9 CPU and then run it at half speed for a Chrome compile.

Oleg_NYC · Mar 31, 2024

astyle said:
Uhhh... in your photo, "Pass" means "Round of testing". You "Pass" through a round of testing like you'd "Pass" through a street. In your photo, pass 4 (not 5) is still going - meaning the RAM was scanned several times over, and is now being scanned for the 4th time.

Let me repeat. As soon as the memtest86+ program starts, the first round of testing begins. It displays the message "Pass: ". There is no number next to "Pass: "; it is simply "Pass: ". After tests from the first round finish running, a new message is displayed: "Pass: 1". The second round of testing is now in progress: memtest86+ performs the same tests that were performed during the first round. When it concludes, it displays "Pass: 2". And so on. Doesn't it mean that when "Pass: 4" is displayed, the fourth round during which the same tests were performed has concluded and the fifth round is in progress now?

Crivens · Mar 31, 2024

Let me reiterate: you have a lack of cooling and a surplus of blinky. Get that sorted out. Then we will look at the next problem.

Oleg_NYC · Mar 31, 2024

Crivens said:
Let me reiterate: you have a lack of cooling and a surplus of blinky. Get that sorted out. Then we will look at the next problem.

I might have to ask questions about the type of cooler that would fit into my tower... So please don't get annoyed by me just yet...

cracauer@ said:
As far as the memory errors are concerned, they aren't necessarily caused by bad RAM cells. I think it is more likely the CPU flipping some bits.

I decided to run memtest86+ again and had these results: https://ibb.co/JnbFV4K . As you can see, unlike what the earlier results show, there were no "pass 2" and "pass 3" errors, but there was a "pass 4" error. If you think it's okay to ignore it, then I'll stop with memory testing.

Cath O'Deray · Mar 31, 2024

Oleg_NYC said:
https://ibb.co/JnbFV4K

Cropped and scaled, for readers' convenience:

Oleg_NYC said:
… If you think it's okay to ignore it, then

An error during any run (without regard to whether it's the fourth, or whatever) is significant.

Oleg_NYC said:
I'll stop with memory testing.

IMHO it's telling nothing new, so – politely – it's a waste of time.

Focus, instead, on the hardware-specific advice.

Make good your hardware, then retest.

Oleg_NYC · Mar 31, 2024

Ha! Take a look at this: https://www.tomshardware.com/video-...-to-downclock-their-chips-to-prevent-crashing . This article talks about downclocking 13-generation and 14-generation CPUs to 5 Ghz to avoid crashes. In a funny way, it has some relevance to this thread.

astyle · Mar 31, 2024

Oleg_NYC said:
Pass: 4" is displayed, the fourth round during which the same tests were performed has concluded and the fifth round is in progress now?

In case of this program, "Pass" always means "Round of testing/scanning". As soon as an error is detected, it will be reported. That much should be obvious to anyone using the program.

Oleg_NYC said:
Ha! Take a look at this: https://www.tomshardware.com/video-...-to-downclock-their-chips-to-prevent-crashing . This article talks about downclocking 13-generation and 14-generation CPUs to 5 Ghz to avoid crashes. In a funny way, it has some relevance to this thread.

That article covers Windows gaming and data bottlenecks. Very different beast from compiling and FreeBSD. People who know something about benchmarking hardware and numerical analysis - they know how and why Windows gaming is such a different beast from compiling on FreeBSD.

Oleg_NYC · Apr 1, 2024

Okay, so, I opened the computer case and took 3 photos of the cooler:
https://ibb.co/c65rGB0
https://ibb.co/SNnXzv4
https://ibb.co/1TJMB77
Its exact model name wasn't written on any part of it that I could see. But judging from these photos, you can definitely conclude it's not as powerful as a 125 W cooler, right? Would you recommend I buy this cooler: https://www.amazon.com/quiet-BK007-Elegant-Surface-Technology/dp/B087VL2Z21?th=1 ? I want a cooler that can easily be screwed to my motherboard. I don't want to remove the motherboard from the case just so I could attach a cooler to it.

astyle · Apr 1, 2024

Oleg_NYC said:
Okay, so, I opened the computer case and took 3 photos of the cooler:
https://ibb.co/c65rGB0
https://ibb.co/SNnXzv4
https://ibb.co/1TJMB77
Its exact model name wasn't written on any part of it that I could see. But judging from these photos, you can definitely conclude it's not as powerful as a 125 W cooler, right? Would you recommend I buy this cooler: https://www.amazon.com/quiet-BK007-Elegant-Surface-Technology/dp/B087VL2Z21?th=1 ? I want a cooler that can easily be screwed to my motherboard. I don't want to remove the motherboard from the case just so I could attach a cooler to it.

The photos do show a rather old model... and it doesn't look like you unbolted the cooler to look on all sides.

Be Quiet! is a good brand - I have a Pure Loop liquid cooler by that same company myself. Most aftermarket coolers are pretty easy to install, no need to pull the mobo out. So yeah, I'd say go ahead and get that cooler. Do watch out for cable connectors - those can be a little awkward to install.

The computer keeps crashing during the compilation process of chromium-123.0.6312.58_1, but never crashed when I was building version 123.0.6312.58!

Oleg_NYC

astyle

cracauer@

Cath O'Deray

Oleg_NYC

Cath O'Deray

astyle

Crivens

Administrator

astyle

Crivens

Administrator

T-Aoki

Oleg_NYC

cracauer@

cracauer@

astyle

Oleg_NYC

cracauer@

Oleg_NYC

Crivens

Administrator

Oleg_NYC

Cath O'Deray

Oleg_NYC

astyle

Oleg_NYC

astyle

The computer keeps crashing during the compilation process of chromium-123.0.6312.58_1, but never crashed when I was building version 123.0.6312.58!

Administrator

Administrator

​

Administrator