Problem with cpu/memory?

Hello,

Every night I have this error messages on a system.
Any ideas?

Thanks!

My system:
Code:
xxxxxx FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC

# sysctl hw.ncpu
hw.ncpu: 24

# sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU           X5675  @ 3.07GHz
hw.ncpu: 24
hw.machine_arch: amd64
Code:
# dmesg | grep memory
real memory  = 154618822656 (147456 MB)
avail memory = 149119664128 (142211 MB)
MCA: CPU 16 COR (1) MS channel ?? memory error
MCA: CPU 17 COR (1) MS channel ?? memory error
MCA: CPU 16 COR (240) OVER MS channel ?? memory error
MCA: CPU 17 COR (240) OVER MS channel ?? memory error
MCA: CPU 17 COR (1) MS channel ?? memory error
MCA: CPU 16 COR (1) MS channel ?? memory error
MCA: CPU 17 COR (240) OVER MS channel ?? memory error
MCA: CPU 16 COR (240) OVER MS channel ?? memory error
MCA: CPU 17 COR (1) MS channel ?? memory error
MCA: CPU 16 COR (1) MS channel ?? memory error
MCA: CPU 17 COR (238) OVER MS channel ?? memory error
MCA: CPU 16 COR (238) OVER MS channel ?? memory error
MCA: CPU 16 COR (1) MS channel ?? memory error
MCA: CPU 17 COR (1) MS channel ?? memory error
MCA: CPU 17 COR (2) OVER MS channel ?? memory error
MCA: CPU 16 COR (2) OVER MS channel ?? memory error
MCA: CPU 12 COR (238) OVER MS channel ?? memory error
Code:
Jan 26 00:40:26 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 26 00:40:26 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 26 00:40:26 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 26 00:40:26 xxxxxxx kernel: MCA: CPU 16 COR (1) MS channel ?? memory error
Jan 26 00:40:26 xxxxxxx kernel: MCA: Misc 0x51b075a000086140
Jan 26 00:40:26 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 26 00:40:27 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 26 00:40:27 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 26 00:40:27 xxxxxxx kernel: MCA: CPU 17 COR (1) MS channel ?? memory error
Jan 26 00:40:27 xxxxxxx kernel: MCA: Misc 0x51b075a000086140
Jan 26 00:40:27 xxxxxxx kernel: MCA: Bank 8, Status 0xc8003c00000200cf
Jan 26 00:40:27 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 26 00:40:27 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 26 00:40:27 xxxxxxx kernel: MCA: CPU 16 COR (240) OVER MS channel ?? memory error
Jan 26 00:40:27 xxxxxxx kernel: MCA: Misc 0x3d80d40f00080340
Jan 26 00:40:27 xxxxxxx kernel: MCA: Bank 8, Status 0xc8003c00000200cf
Jan 26 00:40:27 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 26 00:40:27 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 26 00:40:27 xxxxxxx kernel: MCA: CPU 17 COR (240) OVER MS channel ?? memory error
Jan 26 00:40:27 xxxxxxx kernel: MCA: Misc 0x3d80d40f00080340
Jan 27 00:40:24 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 27 00:40:24 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 27 00:40:24 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 27 00:40:24 xxxxxxx kernel: MCA: CPU 17 COR (1) MS channel ?? memory error
Jan 27 00:40:24 xxxxxxx kernel: MCA: Misc 0x51b075a000086240
Jan 27 00:40:25 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 27 00:40:25 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 27 00:40:25 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 27 00:40:25 xxxxxxx kernel: MCA: CPU 16 COR (1) MS channel ?? memory error
Jan 27 00:40:25 xxxxxxx kernel: MCA: Misc 0x51b075a000086240
Jan 27 00:40:25 xxxxxxx kernel: MCA: Bank 8, Status 0xc8003c00000200cf
Jan 27 00:40:25 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 27 00:40:25 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 27 00:40:25 xxxxxxx kernel: MCA: CPU 17 COR (240) OVER MS channel ?? memory error
Jan 27 00:40:25 xxxxxxx kernel: MCA: Misc 0x3d80d40f00081440
Jan 27 00:40:25 xxxxxxx kernel: MCA: Bank 8, Status 0xc8003c00000200cf
Jan 27 00:40:25 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 27 00:40:25 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 27 00:40:25 xxxxxxx kernel: MCA: CPU 16 COR (240) OVER MS channel ?? memory error
Jan 27 00:40:25 xxxxxxx kernel: MCA: Misc 0x3d80d40f00081440
Jan 28 00:40:22 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 28 00:40:22 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 28 00:40:22 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 28 00:40:22 xxxxxxx kernel: MCA: CPU 17 COR (1) MS channel ?? memory error
Jan 28 00:40:23 xxxxxxx kernel: MCA: Misc 0x51b075a000083140
Jan 28 00:40:23 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 28 00:40:23 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 28 00:40:23 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 28 00:40:23 xxxxxxx kernel: MCA: CPU 16 COR (1) MS channel ?? memory error
Jan 28 00:40:23 xxxxxxx kernel: MCA: Misc 0x51b075a000083140
Jan 28 00:40:23 xxxxxxx kernel: MCA: Bank 8, Status 0xc8003b80000200cf
Jan 28 00:40:23 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 28 00:40:23 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 28 00:40:23 xxxxxxx kernel: MCA: CPU 17 COR (238) OVER MS channel ?? memory error
Jan 28 00:40:23 xxxxxxx kernel: MCA: Misc 0x3d80d40f00086140
Jan 28 00:40:23 xxxxxxx kernel: MCA: Bank 8, Status 0xc8003b80000200cf
Jan 28 00:40:23 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 28 00:40:23 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 28 00:40:23 xxxxxxx kernel: MCA: CPU 16 COR (238) OVER MS channel ?? memory error
Jan 28 00:40:23 xxxxxxx kernel: MCA: Misc 0x3d80d40f00086140
Jan 29 00:40:21 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 29 00:40:21 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 29 00:40:21 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 29 00:40:21 xxxxxxx kernel: MCA: CPU 16 COR (1) MS channel ?? memory error
Jan 29 00:40:21 xxxxxxx kernel: MCA: Misc 0x51b075a000081040
Jan 29 00:40:21 xxxxxxx kernel: MCA: Bank 8, Status 0x88000040000200cf
Jan 29 00:40:21 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 29 00:40:21 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 29 00:40:21 xxxxxxx kernel: MCA: CPU 17 COR (1) MS channel ?? memory error
Jan 29 00:40:21 xxxxxxx kernel: MCA: Misc 0x51b075a000081040
Jan 29 00:40:21 xxxxxxx kernel: MCA: Bank 8, Status 0xc8000080000200cf
Jan 29 00:40:21 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 29 00:40:21 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 37
Jan 29 00:40:21 xxxxxxx kernel: MCA: CPU 17 COR (2) OVER MS channel ?? memory error
Jan 29 00:40:21 xxxxxxx kernel: MCA: Misc 0xc49bd86000080040
Jan 29 00:40:21 xxxxxxx kernel: MCA: Bank 8, Status 0xc8000080000200cf
Jan 29 00:40:21 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 29 00:40:21 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 36
Jan 29 00:40:21 xxxxxxx kernel: MCA: CPU 16 COR (2) OVER MS channel ?? memory error
Jan 29 00:40:21 xxxxxxx kernel: MCA: Misc 0xc49bd86000080040
Jan 29 00:57:46 xxxxxxx kernel: MCA: Bank 8, Status 0xc8003b80000200cf
Jan 29 00:57:46 xxxxxxx kernel: MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
Jan 29 00:57:46 xxxxxxx kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 32
Jan 29 00:57:46 xxxxxxx kernel: MCA: CPU 12 COR (238) OVER MS channel ?? memory error
Jan 29 00:57:46 xxxxxxx kernel: MCA: Misc 0x3d80d40f00081040
 
It looks like you have ECC memory and the system is reporting correctable errors in bank 8 (wherever that is physically). I would run MemTest86+ to confirm, if possible. You need to figure out if you have some bad memory (most likely), or a problem with the motherboard. And get rid of whatever isn't working right.
 
Hello,

MemTest doesn't find any errors, so I changed the ECC memory in bank 8 with the one in bank 1. Let's see tomorrow if i will have errors again.
 
Back
Top