Hardware Failure: Memory Issue?

I have an old computer that uses a ECS 741GX-M mainboard. Recently, it's started to spontaneously reboot. Here's what I've done thus far:
  • Ran MemTest86+ for 14+ passes, no errors.
  • Reseated the DIMMS, HD cables, and expansion cards.
  • Currently running a bit fade test on MemTest86+. 0 pass no errors. 1 pass running.
  • Each DIMM by itself, same problems
  • Swapped the DIMMS in their respective slots.
I think the DIMMS are bad even though they pass MemTest86+. Swapping the DIMMS made the problem get much worse...to the point I had to enable failsafe mode in MemTest86+ to keep it from crashing. I don't think it's the CPU because MemTest86+ will run without issue.

Opinions? Suggestions?
 
I know about the bad capacitor issue. I've ran into it a few times. But the reboots, when I could catch them, are due to panics. Since I set the sysctl for kern.panic_reboot_wait_time = -1, it no longer reboots on a panic...and I can see what the panic is. It's mostly trap 12 - page fault in kernel mode. But there was a few other weird ones like a trap 9 and a trap 22.

This machine has been running fairly continuously for the last 15 years and starting having problems last week.
 
If you have spare parts (RAM, processor, power supply), try swapping one after another and check if the problem persists. Check that the fans are working (CPU fan, GPU fan, case fan if present). Check that the heat sink is free of dust and other obstacles.

In theory, the culprit could be an arbitrary piece of hardware, e.g. a PCI card. It could also be a harddisk that has a bad cache chip, causing flipped bits in the kernel, leading to panics. It could be everything, basically. Even bad cables can cause problems like that. Therefore, the best way to find out for sure is to swap one piece of hardware after another.
 
...capacitors...

Yes, damn those capacitors. It's always the capacitors. The amount of hardware which is declared broken/dead that can be fixed with nothing but a soldering iron and a couple cents worth of capacitors is practically mindblowing. It might not be the reason in this case but that's nothing but the exception that proves the rule ;)

This machine has been running fairly continuously for the last 15 years and starting having problems last week.

Have you done any updates around that time? Anything that might have changed something in the kernel to not play nice with your hardware? I figure with stuff that old there always is a chance for some unexpected side effect to go unnoticed during development.
 
Given that it's an ECS board it's most likely related to capacitors or possibly the PSU as ECS was at the time considered to be a "bargain bin" brand which usually meant that everything was kept at a low cost. That being said, unless you have some ancient PCI card or similar that you really need do yourself a favour and replace it with something newer like a RockPro64 or x86-64 which will both save you time and possibly money (electricity) in the end. :)

FWIW, I still think my old ECS K75SA motherboard still works but I haven't touched it years :/
 
Ladies and gentlemen, thank you for the replies. I have determined that the mainboard itself has failed. The board has about 20 years of service. So I decided to upgrade the system. Here are the specs of the new system:

ASRock FM2A68M-DG3+ Mainboard (MicroATX)
AMD Athlon X4 860K Quad-Core CPU AMD64 at 3700MHz
Corsair Vengeance 16GB (2*8GB) 1600MHz RAM
VisionTek AMD Radeon HD 5450 2GB DDR3 display adapter
Silicon Power SSD 120GB sATA-3 6Gbps
u-Bit PCIe Dual-Port Network Interface (RTL8111G chipset)

The parts that stayed the same:
Case
A 430W PSU
And I'm still using the PCI based Netis WiFi card (Ralink chipset).

This machine flies compared to what it did before. This is the first of several machines that have gone 64-bit. My main server is my next upgrade. That one will use the mainboard and parts out of the workstation (Gigabyte board, 8-core CPU at 4000MHz, 16GB RAM, etc...).

The firewall now runs with no errors on the above listed hardware. So it seems that everything is compatible. The uBit network card as well as the onboard LAN both use the re driver. Now to go through the config and build a custom kernel that's specifically tailored to this machine.
 
Back
Top