HP DL 585 / ACPI ID / ECC Memory / Panic

Hi,

I recently added a zfs disk array to my old HP 585 G1 Server.
Immediately there was kernel panics and I have spent quite a bit of time
figuring out what was really wrong.

The system has 4 cpu cards with opteron double core processors. Each
card has 4x2 gigabyte memory 4x2x4 = 32 gigabyte of total system mem.
The memory is DDR400 ECC mem.

The panic was very easily reproducable. I just had to issue enough reads
to the system up until the faulty mem was accessed.

Strangely I can run memtest86+ with the DDR setting on and I find no
error what so ever.

Adding

hint.lapic.2.disabled=1 > /boot/loader.conf

Immediately mitigates the error for FreeBSD. So here is my conclusion:

If you can make the system stable by disabling one core on one cpu card:

1) The other cards / mem must be ok.
2) The mainboard must be ok since one of the cores on the cpu is still
running / not barfing panics.
3) the cpu core with acpi 2 is probably also ok. it is on the same chip
as a non disabled core.
4) It is likely down to a rotten DIMM.

In place of mindlessly trying to find the culprit by switching dimms I
would really like to identify the CPU, card and mem module from the os.

Info here:

http://pastebin.com/jqufNKck

Thank you for your time and help.

--


Med venlig hilsen / with regards

Nikolaj Hansen
 
Back
Top