MCA error?

Today, I saw this on the console of one of my machines:

Code:
MCA: Bank 1, Status 0x9400000000000151
MCA: Global Cap 0x0000000000000104, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x644, APIC ID 0
MCA: CPU 0 COR ICACHE L1 IRD error
MCA: Address 0xc08062e0

I have never seen this before on any of my machines.

The way that I am reading this is that CPU 0 had some kind of error with the on-chip instruction cache. Not sure what COR and IRD stand for. The system is stable otherwise. Any ideas? This is a pretty old machine. I visually inspected the capacitors around the CPU and everything is fine.
 
I've had them, not exactly this one but other. They typically indicate there's a hardware issue. Run the error through sysutils/mcelog, that should be able to translate it into something a bit more meaningful for us humans.
 
I ran mcelog and here's what it said:

Code:
mcelog: Unknown CPU type vendor 2 family 6 model 4
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 1 TSC 44a1caf3b196
ADDR c08062e0
TIME 1505353501 Wed Sep 13 17:45:01 2017
STATUS 9400000000000151 MCGSTATUS 0
MCGCAP 104 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 6 Model 4

I found some information online about sysutils/cpuburn. It's a single CPU single core system, so I ran burnMMX and burnK7 together all night long. Never had another error even though the machine's balls were to the wall. The information that mcelog shows is even less helpful than the original log message. Looking at the line
"CPU 0 COR ICACHE L1 IRD error" I think I can gauge what happened.
  • COR = Corrected (I think)
  • ICACHE = Instruction Cache
  • L1 = L1 Cache (On Chip)
  • IRD = Instruction Read (I think)
  • error is self explanatory.
I have an advanced hardware architecture class tomorrow. I think I will show and tell ;).

I went digging through the AMD docs and found some information about MCA (Machine Check Architecture) & MCE (Machine Check Exception) which I found quite interesting. Nothing specific on my error code though.
 
Looking at the line
"CPU 0 COR ICACHE L1 IRD error" I think I can gauge what happened.
  • COR = Corrected (I think)
  • ICACHE = Instruction Cache
  • L1 = L1 Cache (On Chip)
  • IRD = Instruction Read (I think)
  • error is self explanatory.
Yes, that all sounds plausible. It's certainly the type of message you get to see with MCE/MCA. In this case it looks like it corrected itself. But if this starts happening more often it's probably a good idea to replace the CPU. Depending on the age of the machine this might prove difficult. Socket types seem to change faster than I change underwear.
 
Yes, that all sounds plausible. It's certainly the type of message you get to see with MCE/MCA. In this case it looks like it corrected itself. But if this starts happening more often it's probably a good idea to replace the CPU. Depending on the age of the machine this might prove difficult.

I know. The machine is an AMD Athlon Thunderbird 1400MHz ciera 2001, Socket A. It turns out that the specific error code is model dependent. But I have enough information to know that I am not going to worry about it. I will upgrade the machine one of these days, but it still works pretty well and is reliable.

Socket types seem to change faster than I change underwear.

I could comment on that...but I won't. :D
 
Back
Top