I have had on my MARVELL 88SE9230 card the following:
Of course that caused my ZFS pool to go into degraded state. Once I only noticed one disk, so I was thinking the HDD may be faulty,
but then I noticed that it kicked out ALL of the 4 disks which were connected to a specific PCIe card.
So I replaced that card and I hope it will not get back, while I am waiting for a new one (I have found some spare old ASmedia I am using now)
First I was suspecting the Marvell driver, but I have 2 cards of marvell, and the other one is
The faulty suspect I now replaced was
now I have there
and I see no errors yet.
My first theory was a bug in FreeBSD 12.0/12.1 driver. I can not rule it out yet, but look at this:
This looks strange and I have a feeling that we have a torn chip here. Has ANYONE seen ANYTHING remotely like this? I have not.
Look at the back:
I can remember that there was NOTHING on the chip, no glue, no nothing. And all of sudden there is SOMETHING on it.
Does a hardware failure represent itself like this?
Thanks!
Code:
Oct 18 05:14:54 constance kernel: (ada5:ahcich7:0:0:0): Error 5, Periph was invalidated
Oct 18 05:14:54 constance kernel: (ada5:ahcich7:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 c8 3e ae 40 07 01 00 00 00 00
Oct 18 05:14:54 constance kernel: (ada5:ahcich7:0:0:0): CAM status: Command timeout
Oct 18 05:14:54 constance kernel: (ada5:ahcich7:0:0:0): Error 5, Periph was invalidated
Oct 18 05:15:41 constance kernel: ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Oct 18 05:15:41 constance kernel: ahcich7: Poll timeout on slot 7 port 0
Oct 18 05:15:41 constance kernel: ahcich7: is 00000000 cs f00000ff ss 00000040 rs 00000080 tfd 80 serr 00000000 cmd 10001b17
Oct 18 05:15:41 constance kernel: (aprobe0:ahcich7:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
Oct 18 05:15:41 constance kernel: (aprobe0:ahcich7:0:0:0): CAM status: Command timeout
Oct 18 05:15:41 constance kernel: (aprobe0:ahcich7:0:0:0): Error 5, Retries exhausted
Oct 18 05:16:11 constance kernel: ahcich7: Timeout on slot 8 port 0
Oct 18 05:16:11 constance kernel: ahcich7: is 00000000 cs f00003ff ss 00000340 rs 00000300 tfd 80 serr 00000000 cmd 10001b17
Oct 18 05:16:11 constance kernel: (ada5:ahcich7:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 70 6c b3 40 07 01 00 00 00 00
Oct 18 05:16:11 constance kernel: (ada5:ahcich7:0:0:0): CAM status: Command timeout
Oct 18 05:16:11 constance kernel: (ada5:ahcich7:0:0:0): Error 5, Periph was invalidated
Oct 18 05:16:11 constance kernel: (ada5:ahcich7:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 d8 d4 de 40 0b 01 00 00 00 00
Oct 18 05:16:11 constance kernel: (ada5:ahcich7:0:0:0): CAM status: Unconditionally Re-queue Request
Oct 18 05:16:11 constance kernel: (ada5:ahcich7:0:0:0): Error 5, Periph was invalidated
Oct 18 05:16:11 constance kernel: (ada5:ahcich7:0:0:0): Periph destroyed
Oct 18 05:16:57 constance kernel: ahcich7: AHCI reset: device not ready after 31000ms (tfd = 00000080)
Oct 18 05:16:57 constance kernel: ahcich7: Poll timeout on slot 10 port 0
Oct 18 05:16:57 constance kernel: ahcich7: is 00000000 cs f00007ff ss 00000340 rs 00000400 tfd 80 serr 00000000 cmd 10001b17
Oct 18 05:16:57 constance kernel: (aprobe0:ahcich7:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
Oct 18 05:16:57 constance kernel: (aprobe0:ahcich7:0:0:0): CAM status: Command timeout
Oct 18 05:16:57 constance kernel: (aprobe0:ahcich7:0:0:0): Error 5, Retries exhausted
but then I noticed that it kicked out ALL of the 4 disks which were connected to a specific PCIe card.
So I replaced that card and I hope it will not get back, while I am waiting for a new one (I have found some spare old ASmedia I am using now)
First I was suspecting the Marvell driver, but I have 2 cards of marvell, and the other one is
Code:
ahci0: <Marvell 88SE9215 AHCI SATA controller> port 0xe050-0xe057,0xe040-0xe043,0xe030-0xe037,0xe020-0xe023,0xe000-0xe01f mem 0x91510000-0x915107ff irq 16 at device 0.0 on pci1
ahci0: AHCI v1.00 with 4 6Gbps ports, Port Multiplier supported with FBS
Code:
/var/log/messages.4.bz2:Mar 27 00:14:01 constance kernel: ahci1: <Marvell 88SE9230 AHCI SATA controller> port 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem 0x91310000-0x913107ff irq 17 at device 0.0 on pci2
/var/log/messages.4.bz2:Mar 27 00:14:01 constance kernel: ahci1: AHCI v1.20 with 8 6Gbps ports, Port Multiplier not supported
/var/log/messages.4.bz2:Mar 27 00:14:01 constance kernel: ahci1: quirks=0x900<NOBSYRES,ALTSIG>
Code:
ahci1: <ASMedia ASM1062 AHCI SATA controller> port 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem 0x91410000-0x914101ff irq 17 at device 0.0 on pci2
ahci1: AHCI v1.20 with 2 6Gbps ports, Port Multiplier supported
ahci1: quirks=0xc00000<NOCCS,NOAUX>
My first theory was a bug in FreeBSD 12.0/12.1 driver. I can not rule it out yet, but look at this:
This looks strange and I have a feeling that we have a torn chip here. Has ANYONE seen ANYTHING remotely like this? I have not.
Look at the back:
I can remember that there was NOTHING on the chip, no glue, no nothing. And all of sudden there is SOMETHING on it.
Does a hardware failure represent itself like this?
Thanks!