MCA: CPU 0 COR GCACHE LG RD error

Hi folks.

I got the following lines when checking my server with "dmesg -a", any idea what it refers? MCA: CPU 0 COR GCACHE LG RD error

OS: FreeBSD 13-RELEASE

Code:
Starting mysql.
Starting background file system checks in 60 seconds.
...
Sat Jan 29 14:59:15 +03 2022
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000

Thanks in advance.
 
Note the bank and address of this one. Then move your memory modules around, shift everything around one slot for example. This will re-seat everything (in case it's a bad connection). If you get a similar error again but it now has a different bank and address then you know one of the memory modules is broken (the error moved with the module). If the bank and address stays the same the issue is with the mainboard or CPU.
 
Thanks for your replies. Well, so we aren't clear yet here.. I really hope it's not a hardware fault.. That'd lead a brand new (and bad) adventure here *locally*.

Here's is my dmidecode output: https://bsd.to/5VpC/raw

And mcelog output:
Code:
root@mybox:~ # mcelog --no-dmi --ascii --file /var/log/dmesg.today
mcelog: Unknown CPU type vendor 2 family 23 model 1
mcelog: Unknown CPU type vendor 2 family 23 model 1
Hardware event. This is not a software error.
CPU 0 BANK 18
MISC d01b0fff01000000 ADDR 40000051068f800
STATUS 9c2040000000011b MCGSTATUS 0
MCGCAP 11c APICID 0 SOCKETID 0
CPUID Vendor AMD Family 23 Model 1 Step 0


Would be much grateful on having any idea/suggestion.
 
Well, it seems so;

"Base Board Information
Manufacturer: ASRockRack
Product Name: B450D4U-V1L"

P.S.: This is my dedicated server rented at Hetzner.
 
Hetzner responded and thanks to them, they offered several solutions really quickly and professionally;

"We can offer you the following options for the server:
1. Exchange the server, but keep the drives:
To rule out a majority of the sources of hardware error, it is possible for us to exchange your server but keep all of your drives. The server would need to be shut down for approximately 20-30 minutes."

And the second option was to exchange the server and exchange the drives, and the third one was to run a complete hardware check (10 hours of diagnostics duration).

I'm going to request the first option; exchanging the server and MOVING the drives.

Now, I wanted to know, would there be any OS-level (FreeBSD kernel, boot, ZFS structure) trouble on moving the current 2xNVMe disks (ZFS - stripe) into the exact model but new server?
 
You need the same BIOS settings on the new motherboard like SATA mode (AHCI) UEFI/Legacy (CSM), NVME Raid on/off, Secure boot off and so on. My guess is that you are using UEFI in order to boot from NVME.
It depend how is set up your current boot. Check your current boot method using sysctl machdep.bootmethod if it's set to UEFI then verify how the EFI variable is set in the bios using efibootmgr -v this will show you if you are booting directly from efi file or searching the first disk for ESP and booting the default bootx64.efi. You may need to create a new boot entry on the new motherboard using the EFI shell or using graphical interface if the UEFI bios has one.
Anyway write down the output of efibootmgr -v which will show you current UEFI entries recorded in the bios.

Edit:
Did you had other MCA errors from the last one? Or it was single event.
 
GCACHE LG RD error
That can be either a CPU problem somewhere in the L3 cache or an ECC memory error.
Depending on how the CPU is configured (there are too many options) DRAM ECC errors may not get detected until corrupt data is accessed in the L3 cache.
 
VladiBG The bootmethod was: BIOS, from the output of sysctl machdep.bootmethod. I decided to setup a new server&new disks (had my backups already) anyway.
And yes, the "dmesg -a" started to fill errors like below;

Code:
MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01a0ffc01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0x9c2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01b0fff01000000
MCA: Bank 18, Status 0xdc2040000000011b
MCA: Global Cap 0x000000000000011c, Status 0x0000000000000000

MCA: Vendor "AuthenticAMD", ID 0x870f10, APIC ID 0
MCA: CPU 0 COR OVER GCACHE LG RD error
MCA: Address 0x40000051068f800
MCA: Misc 0xd01a0ffe01000000

Andriy good point as well.

Thank you guys.
 
Back
Top