Hardware memory check

I've recently added memory and now I'm seeing this in dmesg:
Code:
MCA: Bank 14, Status 0x8c000040000800c1
MCA: Global Cap 0x000000000f000c14, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x50657, APIC ID 0
MCA: CPU 0 COR (1) MS channel 1 memory error
MCA: Address 0x320311fa00 (Mode: Physical Address, LSB: 6)
MCA: Misc 0x918c00000000086
MCA: Bank 7, Status 0x9c00004001010091
MCA: Global Cap 0x000000000f000c14, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x50657, APIC ID 0
MCA: CPU 0 COR (1) EN RD channel 1 memory error
MCA: Address 0x320311fa00 (Mode: Physical Address, LSB: 6)
MCA: Misc 0x200401c089c01086
MCA: Bank 7, Status 0x9c00004001010091
MCA: Global Cap 0x000000000f000c14, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x50657, APIC ID 0
MCA: CPU 0 COR (1) EN RD channel 1 memory error
MCA: Address 0x320311fa00 (Mode: Physical Address, LSB: 6)
MCA: Misc 0x200400c008001086
MCA: Bank 14, Status 0x8c000040000800c1
MCA: Global Cap 0x000000000f000c14, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x50657, APIC ID 0
MCA: CPU 0 COR (1) MS channel 1 memory error
MCA: Address 0x320311fa00 (Mode: Physical Address, LSB: 6)
MCA: Misc 0x918c00000000086

In IPMI I see this, ..

1697965814358.png


I suppose maybe one memory module is bad however all memory is detected. So how would I check what is going on? With I guess memtester? Not sure what the locking stepping exactly means but seems to me that this was before the memory was installed.
 
Total memory size usually detected by info from SPD chip on each memory module.
But SPD chips and memory chips are different things.
So it is possible to have a failed memory module with the correct SPD. That memory module will be detected by mainboard, but it will not work because of memory errors or something else.

Try to run built-in BIOS/UEFI memory test and memtest86.
In case of any memory errors try to remove newly installed memory modules and run tests again.
Repeat these steps until failed memory module will be identified.
 
I've recently added memory
That is kinda vague.
Did you yank all old ram and add new?
If trying to add memory with different memory modules you might need to shuffle them.

At very minimum do a reseat. If env is dusty give it blowdown too.
 
I suppose maybe one memory module is bad however all memory is detected.
Code:
MCA: CPU 0 COR (1) MS channel 1 memory error
ECC is doing what it's supposed to do, CORrect memory errors. Still a bad DIMM that needs replacing. Address seems to be the same every time, so I suspect it's only one that's bad.
 
Back
Top