Solved WD 8TB CAM errors

Hey everyone,

Has anyone had problems with the new Western Digital Red 8TB NAS (5400, not 7200) drives under generic 11.1-RELEASE? I'm running two with GELI in a ZFS mirror on an HP Gen8 Microserver, but after three months have started seeing strings of CAM errors.

Code:
(ada0:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 f8 f0 00 40 68 01 01 00 00 00 00
(ada0:ahcich3:0:0:0): CAM status: Command timeout
(ada0:ahcich3:0:0:0): Retrying command
ahcich3: Timeout on slot 16 port 0
(ada0:ahcich3:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich3:0:0:0): CAM status: Command timeout
(ada0:ahcich3:0:0:0): Retrying command

IIRC, I saw these errors in the 8.0-RELEASE days with a Samsung spinning, but not loading ahci(4) mitigated it. That was a long time ago though, so I'm almost certain this is just a dying drive.

Curiously, smartd output is consistent with the other working drive. I'm also running two of these drives with btrfs on Ubuntu in a lab at work (*nervous twitch*) and haven't seen issues.

Before sending it back to WD, anyone seen something similar with these specific drives?
 
Timeout could easily be caused by communication problems (SATA cable, port) rather than by the drive itself. Try reseating cables, switching the two drives, and similar debugging techniques.

A drive internally failing without any trace of a problem visible in SMART data is possible, but very unlikely.
 
Hey Ralph, cheers. Your suspicions were correct. I moved the drives to another microserver, booted and they worked. Either a problem with the connectors on the suspect machine, or maybe the RAID card. Either way, I know where to troubleshoot now.
 
Back
Top