Hi,
I am not sofriendly familiar when it comes to FreeBSD. I have a backup server with ZFS configured, recently in dmesg(1) I have found the following errors,
I found on forum that these are ECC errors and they are pointing to memory hardware issue, so in order to find exact DIMM slot, I installed mcelog with just one command
Above same message repeated for 8 times, This was long output, so removed half of it.
I didn't get which slot is having problem from output. But this message confused me
So I found its might be bug of mcelog 1.0pre, so I am trying to upgrade mcelog, I have downloaded http://pkg.freebsd.org/freebsd:9:x86:64 ... 1.0.p3.txz, but I don't know I can upgrade it with existing one or how do I install this package?
I need help ASAP.
I am not so
Code:
MCA: Bank 8, Status 0xcc0000800001009f
MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x106e5, APIC ID 0
MCA: CPU 0 COR (2) OVER RD channel ?? memory error
MCA: Address 0x38c98a040
MCA: Misc 0x1374400400014000
MCA: Bank 8, Status 0xcc0000c00001009f
MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x106e5, APIC ID 0
MCA: CPU 0 COR (3) OVER RD channel ?? memory error
MCA: Address 0x38c98a040
MCA: Misc 0x1374400400014000
Code:
# sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU X3440 @ 2.53GHz
hw.ncpu: 8
hw.machine_arch: amd64
#
# uname -a
FreeBSD xyz.com 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
I found on forum that these are ECC errors and they are pointing to memory hardware issue, so in order to find exact DIMM slot, I installed mcelog with just one command
cd /usr/ports/sysutils/mcelog/ && make install clean. And I check version, it's mcelog 1.0pre. If I run mcelog I get this in output:
Code:
# mcelog
mcelog: Unsupported new Family 6 Model 1e CPU: only decoding architectural errors
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 0
CPU 0 BANK 8 TSC b052ea70bdf2ee [at 2533 Mhz 226 days 17:50:41 uptime (unreliable)]
MISC 1374400400014303 ADDR 38c98a040
MCG status:
MCi status:
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 30
mcelog: Unsupported new Family 6 Model 1e CPU: only decoding architectural errors
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 1
CPU 0 BANK 8 TSC b1540c2d8c2025 [at 2533 Mhz 228 days 0:50:39 uptime (unreliable)]
MISC 1374400400011285 ADDR 38c98a040
MCG status:
MCi status:
Error overflow
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
STATUS cc0000800001009f MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 30
Above same message repeated for 8 times, This was long output, so removed half of it.
I didn't get which slot is having problem from output. But this message confused me
Code:
mcelog: Unsupported new Family 6 Model 1e CPU: only decoding architectural errors
HARDWARE ERROR.This is *NOT* a software problem!
I need help ASAP.