MCA errors

I have messages like these:
Apr 25 16:52:08 server34 kernel: MCA: Bank 17, Status 0xdc2040000000011b
Apr 25 16:52:08 server34 kernel: MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
Apr 25 16:52:08 server34 kernel: MCA: Vendor "AuthenticAMD", ID 0xa20f12, APIC ID 0
Apr 25 16:52:08 server34 kernel: MCA: CPU 0 COR EN OVER GCACHE LG RD error
Apr 25 16:52:08 server34 kernel: MCA: Address 0x400000662b79e40
Apr 25 16:52:08 server34 kernel: MCA: Misc 0xd01a0fb901000000
Does this mean that the errors are corrected (COR)? Also does it mean that the RAM is broken or it could be something else?
 
You can use sysutils/mcelog to decode those MCA messages.

It's either a DRAM module that's broken (that's why you have ECC memory) or the cache on the CPU. But the most likely culprit is a memory stick gone bad.

 
I hope i'm wrong but according model specific registers (MSRS) the error is during the read in generic cache (shared L3 cache) in the processor not in NB (northbridge) aka not in the bus link or DRAM. So most likely the issue is in the processor but it doesn't hurt to start test memory modules and see if the error is changing. Also check the CPU temp and motherboard power to be sure it's not due to overheat or poor power supply.

MCA: CPU 0 COR EN OVER GCACHE LG RD error

MCA = Machine Check Architecture
CPU error reporting register bank = 17
CPU = 0
MCi_STATUS_EN = En: error enable. Read-write; Updated-by-hardware. Cold reset: 0. 1=MCA error reporting is
enabled for this error, as indicated by MCi_CTL.

COR = Corrected

OVER = Overflow: error overflow. Read-write; set-by-hardware. Cold reset: 0. 1=An error was detected
while the valid bit (Val) was set; at least one error was not logged. Overflow is set independently of
whether the existing error is overwritten.
The following hierarchy identifies the error logging priorities.
1. Uncorrectable errors
2. Correctable errors
The machine check mechanism handles the contents of MCi_STATUS during overflow as follows:
• Higher priority errors overwrite lower priority errors.
• New errors of equal or lower priority do not overwrite existing errors.
• Uncorrectable errors which are not logged due to overflow result in setting PCC, unless the new
uncorrectable error is of the same type and in the same reportable address range as the existing
error.

GCACHE = Generic Cache
LG = Generic (Cache level L0,L1,LG)
RD = Generic Read
error = ...

Source:
 
Back
Top