This is very odd, I see a lot of messages about this kind of issues. My server is affected too, but it's an Intel chipset. I feel like this problem appeared after I moved from FreeBSD 9 to FreeBSD 10 (always RELEASE).
Lucky me, I'm logging almost everything into SPLUNK, that allowed me to trace the first "ahcich*: Timeout on slot *" to 28 minutes after upgrading my FreeBSD 9.3-RELEASE to FreeBSD 10.1-RELEASE.
At boottime, I've got:
which corresponds I think to an SSD plugged here as a rescue disk, but not in use.
This morning, without notice, the server hanged:
I've not tested to play with hint.ahci.X.msi yet, but that would be great if it solves the problem of my HDD (so I don't need to RMA them just to discover the culprit is a driver...)
Lucky me, I'm logging almost everything into SPLUNK, that allowed me to trace the first "ahcich*: Timeout on slot *" to 28 minutes after upgrading my FreeBSD 9.3-RELEASE to FreeBSD 10.1-RELEASE.
Code:
ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf000-0xf01f mem 0xdfa22000-0xdfa227ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 3Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0
At boottime, I've got:
Code:
ahcich5: Timeout on slot 0 port 0
ahcich5: is 00000000 cs 00000000 ss 00000000 rs 00000001 tfd 150 serr 00000000 cmd 0004c017
(aprobe5:ahcich5:0:0:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 46 00
(aprobe5:ahcich5:0:0:0): CAM status: Command timeout
(aprobe5:ahcich5:0:0:0): Retrying command
ahcich5: Timeout on slot 0 port 0
ahcich5: is 00000000 cs 00000000 ss 00000000 rs 00000001 tfd 150 serr 00000000 cmd 0004c017
(aprobe5:ahcich5:0:0:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 46 00
(aprobe5:ahcich5:0:0:0): CAM status: Command timeout
(aprobe5:ahcich5:0:0:0): Error 5, Retries exhausted
ahcich5: Timeout on slot 0 port 0
which corresponds I think to an SSD plugged here as a rescue disk, but not in use.
This morning, without notice, the server hanged:
Code:
ahcich1: Timeout on slot 25 port 0
ahcich1: is 00000000 cs 04000000 ss 06000000 rs 06000000 tfd 40 serr 00000000 cmd 0004d917
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 28 ee 3c 40 2e 00 00 01 00 00
(ada1:ahcich1:0:0:0): CAM status: Command timeout
(ada1:ahcich1:0:0:0): Retrying command
ahcich1: stopping AHCI engine failed
ahcich0: Timeout on slot 23 port 0
ahcich0: is 00000000 cs 01000000 ss 01800000 rs 01800000 tfd 40 serr 00000000 cmd 0004d717
(ada0:ahcich0:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 28 ee 3c 40 2e 00 00 01 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: stopping AHCI engine failed
ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich1: executing CLO failed
ahcich0: AHCI reset: device not ready after 31000ms (tfd = 00000080)
ahcich0: executing CLO failed
ahcich1: Timeout on slot 26 port 0
I've not tested to play with hint.ahci.X.msi yet, but that would be great if it solves the problem of my HDD (so I don't need to RMA them just to discover the culprit is a driver...)