Determining Cougar Point PCH stepping?

I recently purchased a Q67 board for use as a ZFS NAS, and so far things have been going very well. Then I remembered the SATA problems with the B2 stepping on Cougar Point boards. The board was one of the later Q67 boards to become available, but how can I tell for sure if it's the fixed (B3) stepping, or the older, problematic one?

My first thought was [cmd="pciconf"]-lbv[/cmd] but that gives me revision values in hex. After some quick searching, I found that revision 4 is the problematic (B2) stepping, and revision 5 denotes the fixed stepping. Now my only question is, which of the listed items do I need to look at? Perhaps unsurprisingly, I have items of both revisions.

rev=0x04 items
Code:
none0@pci0:0:22:0:      class=0x078000 card=0x1c3a8086 chip=0x1c3a8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family MEI Controller'
    class      = simple comms
    bar   [10] = type Memory, range 64, base 0xfe526000, size 16, enabled
none1@pci0:0:22:2:      class=0x010185 card=0x1c3c8086 chip=0x1c3c8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family IDE-r Controller'
    class      = mass storage
    subclass   = ATA
    bar   [10] = type I/O Port, range 32, base 0xf130, size  8, enabled
    bar   [14] = type I/O Port, range 32, base 0xf120, size  4, enabled
    bar   [18] = type I/O Port, range 32, base 0xf110, size  8, enabled
    bar   [1c] = type I/O Port, range 32, base 0xf100, size  4, enabled
    bar   [20] = type I/O Port, range 32, base 0xf0f0, size 16, enabled
uart2@pci0:0:22:3:      class=0x070002 card=0x1c3d8086 chip=0x1c3d8086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family KT Controller'
    class      = simple comms
    subclass   = UART
    bar   [10] = type I/O Port, range 32, base 0xf0e0, size  8, enabled
    bar   [14] = type Memory, range 32, base 0xfe525000, size 4096, enabled

rev=0x05 items
Code:
em0@pci0:0:25:0:        class=0x020000 card=0x00008086 chip=0x15028086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82579LM Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xfe500000, size 131072, enabled
    bar   [14] = type Memory, range 32, base 0xfe524000, size 4096, enabled
    bar   [18] = type I/O Port, range 32, base 0xf080, size 32, enabled
ehci0@pci0:0:26:0:      class=0x0c0320 card=0x1c2d8086 chip=0x1c2d8086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family USB Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
    bar   [10] = type Memory, range 32, base 0xfe523000, size 1024, enabled
ehci1@pci0:0:29:0:      class=0x0c0320 card=0x1c268086 chip=0x1c268086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family USB Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
    bar   [10] = type Memory, range 32, base 0xfe522000, size 1024, enabled
isab0@pci0:0:31:0:      class=0x060100 card=0x1c4e8086 chip=0x1c4e8086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Q67 Express Chipset Family LPC Controller'
    class      = bridge
    subclass   = PCI-ISA
ahci0@pci0:0:31:2:      class=0x010601 card=0x1c028086 chip=0x1c028086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller'
    class      = mass storage
    subclass   = SATA
    bar   [10] = type I/O Port, range 32, base 0xf0d0, size  8, enabled
    bar   [14] = type I/O Port, range 32, base 0xf0c0, size  4, enabled
    bar   [18] = type I/O Port, range 32, base 0xf0b0, size  8, enabled
    bar   [1c] = type I/O Port, range 32, base 0xf0a0, size  4, enabled
    bar   [20] = type I/O Port, range 32, base 0xf060, size 32, enabled
    bar   [24] = type Memory, range 32, base 0xfe521000, size 2048, enabled
none2@pci0:0:31:3:      class=0x0c0500 card=0x1c228086 chip=0x1c228086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '6 Series/C200 Series Chipset Family SMBus Controller'
    class      = serial bus
    subclass   = SMBus
    bar   [10] = type Memory, range 64, base 0xfe520000, size 256, enabled
    bar   [20] = type I/O Port, range 32, base 0xf040, size 32, enabled

Most of these seem irrelevant (e.g. the ethernet controller), and the most pertinent one, ahci0@pci0:0:31:2:, reports rev=0x05. Am I correct in assuming this is the newer stepping that resolved the issues?
 
Finally found the answer. Intel has a PDF here listing various identifiers and their corresponding revision (and stepping) values.

So, for instance, "Desktop: AHCI (Ports 0-5)" has the ID of "1C02h", and will have a revision of "05h" for B3 stepping. Thus, the relevant portions of this entry are highlighted:
Code:
ahci0@pci0:0:31:2:      class=0x010601 card=0x[B][color="Blue"]1c02[/color][/B]8086 chip=0x[B][color="Blue"]1c02[/color][/B]8086 rev=0x[B][color="Blue"]05[/color][/B] hdr=0x00
 
So, I've been plagued with all sorts of weird errors....stuff crashing with signal 10, signal 11, vm_fault: pager read error, and all sorts of zfs data corruption, some are real and some are imaginary.

To the point where I ended up with a root zpool that would panic on import. Tried all attempts to recover, but ended up importing it read-only grabbing what I thought I needed and starting over and restoring from backup.

Turns out I missed a lot of stuff...and restoring was a mess, because we use NetBackup...so last full was over 2 months ago. I probably shouldn't have tried to restore /usr/local and /var/db/pkgs in the hopes of avoiding the need to rebuild all my ports.

But, almost immediately....more weird signal 10s and vm_faults.

In the past, I had replaced drives a few times, replaced cables, ran tests on new drives, tested memory for a day. But, in a different investigation I was looking at chipset versions, and wondered if my chipset might have issues here.

Guess I have stepping B2

Code:
ahci0@pci0:0:31:2:      class=0x010601 card=0x047e1028 chip=0x1c028086 rev=0x04 hdr=0x00

Kind of unexpected for a new Optiplex 990.

Are there like any ways to get FreeBSD 9 to cope?

The Dreamer
 
Well, I haven't had any luck in getting Dell support to give me a new motherboard...so a workaround, which other manufacturers had apparently resorted to, is to put in a PCIe SATA controller and disabling the onboard.

I first tried two different LSI ones in the PCIe x4 slot (a 4 SATA-II one and a 2 SFF SAS/SATA-III - forget the exact models), but the system wouldn't detect them. Wondered if the slot was bad, so stuck an Intel dual NIC into the slot and system detected that fine.

In fact, its the second LSI controller and the Intel dual NIC that we put into our DL380's in a datacenter. Since the SmartArray 410a controller won't just present the disks unconfigured and the boss had trouble getting the broadcomm NICs to do everything he wants (though I thought I had been making good progress in provisioning these servers with FreeBSD 9.0 from pxeboot.

Reading the Dell support forums, I found that there was thread about Sil3132 cards not being detected by the BIOS. And, that Dell released a BIOS update that said specifically addressed Sil3132 cards.

Couldn't find anybody saying anything about Sil3124 cards...so I settled with getting a Sil3132 card and now I'm working on getting the system back up. I use both a Sil3124 and a Sil3132 card currently in my home FreeBSD system. Though thinking about getting a pair of ASM1061 to run things (I have 15 drives in my home server.)

NetBackup has made a mess with how it does restores...and apparently the recent backups had been far from complete. This recent restore I worked from full done in January + differentials since. The previous time, it was from a full done in early November + differentials up to just before Christmas. For some reason a December full hadn't taken place. Might end up rebuilding all my ports in the end.

Though boss has offered to help me throw it off the building (and make sure that I don't go with it.) He already has an iMac on order for me.

But, I was just starting to get to where FreeBSD would be my main workstation (replacing Solaris 10 on Ultra 20.)

Hopefully there aren't any other hardware problems lingering in the system.

The Dreamer.
 
TheDreamer said:
Hopefully there aren't any other hardware problems lingering in the system.

As luck would have it there was another issue: a bad DIMM. So, I removed one pair to see if it resolved, and then switched to the other pair. Didn't have an extra 4 GB DIMM, so I put in a pair of 2 GB DIMMs. And, things were finally stable but down to 12 GB from 16 GB of RAM.

I added a note about this discovery to the ticket with Dell, and they finally said they would send me a new motherboard and new memory. Except they kept wanting to send a tech to do the install, even though our onsite computer repair is Dell-certified. And, I work in a secured area and I was out that week (at LISA '12.)

It was interesting to note that on old motherboard, the bad chip had a hotsink attached to it. And, on the replacement motherboard, the good chip didn't.

And, then things were finally stable for a little over three months. When one of the drives would disappear, but after a few iterations to remap bad blocks it has been ok (knock on wood).

Meanwhile, .back in December when things were looking bleak, boss twisted my arm into getting 27" iMac (i7 with 1TB fusion drive), which finally showed up right as I was having the drive problems. But, I'm hanging on to my FreeBSD machine as my main work computer. The iMac mainly gets used for IRC and e-mail �e Oh, and a few things that FreeBSD isn't supported for, such as web conferencing using Google hangouts or Webex.

The Dreamer
 
Back
Top