install mcelog 1.0.p3.txz

Hi,
I am not so friendly familiar when it comes to FreeBSD. I have a backup server with ZFS configured, recently in dmesg(1) I have found the following errors,

Code:
MCA: Bank 8, Status 0xcc0000800001009f
MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x106e5, APIC ID 0
MCA: CPU 0 COR (2) OVER RD channel ?? memory error
MCA: Address 0x38c98a040
MCA: Misc 0x1374400400014000
MCA: Bank 8, Status 0xcc0000c00001009f
MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
MCA: Vendor "GenuineIntel", ID 0x106e5, APIC ID 0
MCA: CPU 0 COR (3) OVER RD channel ?? memory error
MCA: Address 0x38c98a040
MCA: Misc 0x1374400400014000

Code:
# sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
hw.machine: amd64
hw.model: Intel(R) Xeon(R) CPU           X3440  @ 2.53GHz
hw.ncpu: 8
hw.machine_arch: amd64
#

# uname -a
FreeBSD xyz.com 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825 root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

I found on forum that these are ECC errors and they are pointing to memory hardware issue, so in order to find exact DIMM slot, I installed mcelog with just one command cd /usr/ports/sysutils/mcelog/ && make install clean. And I check version, it's mcelog 1.0pre. If I run mcelog I get this in output:

Code:
# mcelog
mcelog: Unsupported new Family 6 Model 1e CPU: only decoding architectural errors
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 0
CPU 0 BANK 8 TSC b052ea70bdf2ee [at 2533 Mhz 226 days 17:50:41 uptime (unreliable)]
MISC 1374400400014303 ADDR 38c98a040 
MCG status:
MCi status:
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 30
mcelog: Unsupported new Family 6 Model 1e CPU: only decoding architectural errors
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 1
CPU 0 BANK 8 TSC b1540c2d8c2025 [at 2533 Mhz 228 days 0:50:39 uptime (unreliable)]
MISC 1374400400011285 ADDR 38c98a040 
MCG status:
MCi status:
Error overflow
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
STATUS cc0000800001009f MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 30

Above same message repeated for 8 times, This was long output, so removed half of it.

I didn't get which slot is having problem from output. But this message confused me
Code:
mcelog: Unsupported new Family 6 Model 1e CPU: only decoding architectural errors 
HARDWARE ERROR.This is *NOT* a software problem!
So I found its might be bug of mcelog 1.0pre, so I am trying to upgrade mcelog, I have downloaded http://pkg.freebsd.org/freebsd:9:x86:64 ... 1.0.p3.txz, but I don't know I can upgrade it with existing one or how do I install this package?

I need help ASAP.
 
I found mcelog reliably working only on Intel processors on Red Hat. I have never been able to run mcelog daemon on AMD machines under Red Hat. Consider using ipmitool. I would be really curios if mcelog works on FreeBSD. What is the type of processor? If you can afford to reboot maybe BIOS memtest can help you locate faulty memory module.
 
sysutils/dmidecode may also be useful. Although I'm not sure if it would indicate a bad memory stick. It will definitely show a lot of system information.
 
Back
Top