CAM status: Command timeout

I am having an interesting issue, two weeks back, on my ZFS drive I started to get the errors like this after I turn on the system and is turned on for a while:

Code:
ahcich1: Timeout on slot 31 port 0
ahcich1: is 00000000 cs 00000000 ss 80000001 rs 80000001 tfd 40 serr 00000000 cmd 0000c017
(ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 20 e5 4e c1 40 88 00 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Command timeout
(ada1:ahcich1:0:0:0): Retrying command, 3 more tries remain

I have first changed the SATA cable, the problem remained. Turned disk off and waited for the spare.

Now I have plugged in the spare drive and I am trying to copy the data (at least as much as I can) to the new hdd. The problem is that it hangs in kernel for too long and the copying will take forever.

Is there any way to lower the timeout to a second or two, so I can copy data from broken disk in some reasonable time?
 
In some cases I had the similar errors on 100% good HDDs.
That HDDs had perfect SMART and passed complete surface test.
In some cases this issue related to SATA cables,
but I had the similar cases where disks only from specified vendor (WD) did not work correctly in the specified computer/server with FreeBSD.
Also I had a situation when the same errors happened after upgrading FreeBSD, but the same hardware (disks, mainboard, sata cables) worked well before upgrade.

So you need to know exactly about a condition of your disk. It allows you to choose a correct recovery strategy.

First of all, try to check SMART on the affected disk.
You can do it on FreeBSD with sysutils/smartmontools.
Also you can turn off your FreeBSD, temporary detach the disk and to check the SMART on the another PC.

Just in case:
 
The issue is that contacting SMART is failing all the time.

I have tested the cable and port with another disk and they are both ok, but I am buying disks in pairs just for this reason. Will mount controller pcb from second disk to failing one and backup the data.
 
Back
Top