- Thread Starter
- #151
I think you can safely ignore those RAM errors - unless you enjoy hunting them down.
Wait, you think memtest86+ didn't find anything that would cause real problems? Why are you so sure of it?
I think you can safely ignore those RAM errors - unless you enjoy hunting them down.
Well, the first pass did not find a problem period.Wait, you think memtest86+ didn't find anything that would cause real problems? Why are you so sure of it?
… what your objective is now? …
… I solved the problem of not being able to save core.txt files …
… There were no more panics after I disabled Turbo Boost and "Overclock TVB" in the UEFI. …
… My goal is to simply figure out what caused crashes while I was compiling stuff.
… I am still not entirely sure I found out the cause of these crashes. …
I am a computer n00b; that's why it's hard for me to figure out what I want... But can you please explain to me how this cooler is capable of keeping the cores' temperatures below 65 degrees no matter how intensive the process of compiling software becomes? I am talking specifically about the temperatures that were observed when the clock frequency was set to 3.7Ghz. I know that you guys implied this could be due to thermal throttling, but you also implied that even throttle_log can't tell me whether thermal throttling actually took place... I don't want to unscrew screws attached to the cooler just to find out its exact model name. I am just too afraid of damaging something... And why are you saying I will spend months verifying what is stable? It's very likely I already know what the stable solution is: not raising the clock frequency above 3.7 Ghz. As I said before, I have never crashed again after disabling Turbo Boost in the UEFI. I wonder why such an evil feature was enabled by default in the UEFI.Oleg_NYC , may I ask what your objective is now?
You have a too small CPU cooler. Is it a consideration at all to get a proper one?
You will spend months tinkering in the BIOS and verifying what's stable. Unless there is a direct power limit setting, but I think that is newer generations only.
… Turbo Boost … evil …
I pointed out earlier in the thread - even an underpowered cooler is better than nothing. And, that cooler works in tandem with thermal throttling.But can you please explain to me how this cooler is capable of keeping the cores' temperatures below 65 degrees no matter how intensive the process of compiling software becomes?
Sounds like you have an OEM cooler that came with the entire computer. Those coolers tend to be underpowered, esp. if the OEM machine has a nice processor. If you know the brand/model name of your machine, it's pretty easy to figure out the specs of the cooler used by the OEM.I don't want to unscrew screws attached to the cooler just to find out its exact model name. I am just too afraid of damaging something...
We talked about thermal throttling and looking at RAM errors. If you want to take your own measurements, then yeah, it will take months to learn how to do them correctly, and how to interpret the results you get. Or, you can go to the Forums, ask around, and figure out within hours (if not days) that all it takes to avoid crashes is to avoid Turbo Boost on your processor. Somebody on the Forums will point out an obscure technical bit of info that you can play with and see if it helps. That's what the Forums are for... And BTW: That kind of valuable, but obscure information is called Easter Egg... because it takes some determined research to findAnd why are you saying I will spend months verifying what is stable? It's very likely I already know what the stable solution is: not raising the clock frequency above 3.7 Ghz
Sounds like a stupid decision by the computer's OEM. Well, once you discover that the processor gets too hot, and start looking for solutions, that is a valuable learning experience.As I said before, I have never crashed again after disabling Turbo Boost in the UEFI. I wonder why such an evil feature was enabled by default in the UEFI.
Nope. How would it?the OS will know to stay away from it
Read a textbook by AST (Andrew Tannenbaum) sometime. The kernel does have data structures that keep track of available addresses of RAM. And yes, that is in addition to keeping track of addresses of the HDD/SSD. If an address is corrupted, it will return incorrect data. Considering the kernel needs the returned data to be correct, it will try to relaunch the process to a different area of the RAM. That's what I mean by 'OS will know to stay away from bad spots on the RAM'.Nope. How would it?
And how would it know the data is bad? It does not checksum all memory pages.Read a textbook by AST (Andrew Tannenbaum) sometime. The kernel does have data structures that keep track of available addresses of RAM. And yes, that is in addition to keeping track of addresses of the HDD/SSD. If an address is corrupted, it will return incorrect data. Considering the kernel needs the returned data to be correct, it will try to relaunch the process to a different area of the RAM. That's what I mean by 'OS will know to stay away from bad spots on the RAM'.
So it would malfunction or crash without alerts. If there's ECC or something alike, it would be able to keep on running, alerting for error and suggest replacing.And how would it know the data is bad? It does not checksum all memory pages.
Pass is the number of passes (rounds) that have run. One pass includes all the tests selected.
Pass 2, Test 6: one error detected
Pass 3, Test 6, one error detected.
Pass 4 is still ongoing in the picture you posted, hard to say if it will detect any errors in the future.
If you can not reproduce a panic after step (1), then again broaden your thinking (and know that memory errors may occur).
- Re-enable Turbo Boost and Overclock TVB
- push the computer as hard as possible until the kernel panics
- interpret the resulting core.txt.⋯ file.
I am a computer n00b; that's why it's hard for me to figure out what I want... But can you please explain to me how this cooler is capable of keeping the cores' temperatures below 65 degrees no matter how intensive the process of compiling software becomes? I am talking specifically about the temperatures that were observed when the clock frequency was set to 3.7Ghz. I know that you guys implied this could be due to thermal throttling, but you also implied that even throttle_log can't tell me whether thermal throttling actually took place... I don't want to unscrew screws attached to the cooler just to find out its exact model name. I am just too afraid of damaging something... And why are you saying I will spend months verifying what is stable? It's very likely I already know what the stable solution is: not raising the clock frequency above 3.7 Ghz. As I said before, I have never crashed again after disabling Turbo Boost in the UEFI. I wonder why such an evil feature was enabled by default in the UEFI.
Uhhh... in your photo, "Pass" means "Round of testing". You "Pass" through a round of testing like you'd "Pass" through a street. In your photo, pass 4 (not 5) is still going - meaning the RAM was scanned several times over, and is now being scanned for the 4th time.Wait, I just noticed a contradiction in your text. You said "have run", as in finished running. But then you said Pass 4 "is still ongoing". If, for example, the first round of tests were still in progress, memtest86+ would print "Pass: ". There would be no number next to it. But the photo that I posted shows "Pass: 4". It means Pass 4 is not still ongoing, but has concluded, and now the fifth round of the same tests is in progress. If "Pass 2" was written in the description of one error and "Pass 3" was written in the description of the other error, that means the first error was found before the second round of tests concluded, and the second error was found before the third round of the same tests concluded. The fourth round of the same tests didn't find any errors at all. The photo seems to suggest round 5 is in progress. But there is no such thing as "Pass 0", right? If an error had been found during the first round of tests, it wouldn't have printed "Pass 0" in the description of the error?
The panics are not always the same, I think that was established on page 1 or 2.
grahamperrin said:Re-enable Turbo Boost
cracauer@ said:The thermal throttling is obviously not working as advertised, otherwise you wouldn't have errors.
So, you are saying that if I had a more powerful cooler, I wouldn't experience panics even with Turbo Boost enabled? But someone else said that the thermal throttling mechanism simply gets shut off if the computer is operating at Turbo Boost frequencies. I guess this doesn't matter if a powerful cooler is capable of keeping temperatures low even when Turbo Boost frequencies are utilized.
Uhhh... in your photo, "Pass" means "Round of testing". You "Pass" through a round of testing like you'd "Pass" through a street. In your photo, pass 4 (not 5) is still going - meaning the RAM was scanned several times over, and is now being scanned for the 4th time.
Let me reiterate: you have a lack of cooling and a surplus of blinky. Get that sorted out. Then we will look at the next problem.
cracauer@ said:As far as the memory errors are concerned, they aren't necessarily caused by bad RAM cells. I think it is more likely the CPU flipping some bits.
… If you think it's okay to ignore it, then
I'll stop with memory testing.
In case of this program, "Pass" always means "Round of testing/scanning". As soon as an error is detected, it will be reported. That much should be obvious to anyone using the program.Pass: 4" is displayed, the fourth round during which the same tests were performed has concluded and the fifth round is in progress now?
That article covers Windows gaming and data bottlenecks. Very different beast from compiling and FreeBSD. People who know something about benchmarking hardware and numerical analysis - they know how and why Windows gaming is such a different beast from compiling on FreeBSD.Ha! Take a look at this: https://www.tomshardware.com/video-...-to-downclock-their-chips-to-prevent-crashing . This article talks about downclocking 13-generation and 14-generation CPUs to 5 Ghz to avoid crashes. In a funny way, it has some relevance to this thread.
The photos do show a rather old model... and it doesn't look like you unbolted the cooler to look on all sides.Okay, so, I opened the computer case and took 3 photos of the cooler:
https://ibb.co/c65rGB0
https://ibb.co/SNnXzv4
https://ibb.co/1TJMB77
Its exact model name wasn't written on any part of it that I could see. But judging from these photos, you can definitely conclude it's not as powerful as a 125 W cooler, right? Would you recommend I buy this cooler: https://www.amazon.com/quiet-BK007-Elegant-Surface-Technology/dp/B087VL2Z21?th=1 ? I want a cooler that can easily be screwed to my motherboard. I don't want to remove the motherboard from the case just so I could attach a cooler to it.
Be Quiet!
is a good brand - I have a Pure Loop liquid cooler by that same company myself. Most aftermarket coolers are pretty easy to install, no need to pull the mobo out. So yeah, I'd say go ahead and get that cooler. Do watch out for cable connectors - those can be a little awkward to install.