ahci and siis errors, analyzing disk errors

Hello.

I bought some new sata drives for my file server recently, and got some errors in dmesg after partitioning, adding to a new raidz2 pool and writing a large file of zeros like this:
# time dd if=/dev/zero of=/data/file.out bs=1M count=10k
to the new pool as well as an old pool to compare:

Code:
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 a7 d9 07 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 7d da 07 40 00 00 00 01 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 7d db 07 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 53 dc 07 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 29 dd 07 40 00 00 00 01 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 29 de 07 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 ff de 07 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 f0 2a 2f 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 c6 2b 2f 40 00 00 00 01 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 c6 2c 2f 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 9c 2d 2f 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 72 2e 2f 40 00 00 00 01 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 72 2f 2f 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 48 30 2f 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 1e 31 2f 40 00 00 00 01 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada13:ahcich7:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 d6 1e 32 2f 40 00 00 00 00 00 00
(ada13:ahcich7:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada13:ahcich7:0:0:0): Retrying command
(ada1:siisch1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 82 b5 5b 40 39 00 00 01 00 00
(ada1:siisch1:0:0:0): CAM status: ATA Status Error
(ada1:siisch1:0:0:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT )
(ada1:siisch1:0:0:0): RES: 41 84 5a 66 5b 00 39 00 00 00 01
(ada1:siisch1:0:0:0): Retrying command
I am guessing this means I have to send ada13 back, and ada1 is starting to fail (this one is a member of the old pool). It might also be a bad sata port on the motherboard.

It might also be useful to know that ada1 is a ST3500320NS drive and ada13 is a ST2000DM001-9YN164 drive.
 
Start from checking/replacing cables. These messages look more like interface then media errors.
 
Hey @mav, I saw you wrote the ahci(4) driver, so you probably know the errors best :)

Eventhough I'll be off the internet for some days since moving to a new place, I'll try your advice on moving some sata cables around and see if the errors change. The disks are hard to change/remove, so I'll just have to guess which is which.

All the disks in the new pool is connected directly to the bridges on the motherboard (ICH10R + Marvell chip), and it makes me unoptimistic to think it has gone bad. zpool status reports no errors on the disks though ... scrubbing makes some more dmesg errors. If it is of interest, I'll post them.
 
Back
Top