Hi everyone,
On one server I got some disk related errors. There are not many (the shown dmesg(1) is about 5 months), but frightening anyway. I have no data loss until now, many thanks to mirrored ZFS. Does this messages point to a real harddisk controller failure? Or only a bad configured kernel module? Are there some kernel-parameters to tweak? Something like bus timing settings?
Any suggestions?
On one server I got some disk related errors. There are not many (the shown dmesg(1) is about 5 months), but frightening anyway. I have no data loss until now, many thanks to mirrored ZFS. Does this messages point to a real harddisk controller failure? Or only a bad configured kernel module? Are there some kernel-parameters to tweak? Something like bus timing settings?
Any suggestions?
dmesg
Code:
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 0b 00 40 00 00 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 9f 50 40 5d 01 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 a1 50 40 5d 01 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
(ada1:ahcich1:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 f0 2c 9c 40 60 00 00 00 00 00
(ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
(ada1:ahcich1:0:0:0): Retrying command
ahcich0: Timeout on slot 18 port 0
ahcich0: is 00000000 cs 003c0000 ss 003c0000 rs 003c0000 tfd c0 serr 00000000 cmd 0000d217
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 08 88 d9 2d 40 5c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 113720, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1063093, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1058432, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 606048, size: 8192
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 30 35 fc 40 9d 00 00 01 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 18 36 fc 00 9d 00 00 00 01
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 f8 31 1f 40 9d 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 00 32 1f 00 9d 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 70 55 f9 40 9c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 70 55 f9 00 9c 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 70 55 f9 40 9c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 70 55 f9 00 9c 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
(ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 10 70 55 f9 40 9c 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: ATA Status Error
(ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
(ada0:ahcich0:0:0:0): RES: 41 40 70 55 f9 00 9c 00 00 10 00
(ada0:ahcich0:0:0:0): Retrying command
ahcich0: Timeout on slot 3 port 0
ahcich0: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd c0 serr 00000000 cmd 0000c317
(ada0:ahcich0:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada0:ahcich0:0:0:0): CAM status: Command timeout
(ada0:ahcich0:0:0:0): Retrying command
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 290833, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 637539, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1082327, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 767227, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 586772, size: 12288
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 290833, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1057171, size: 24576
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 201066, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1055856, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 854055, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 637539, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1082327, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 767227, size: 16384
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 586772, size: 12288
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1057171, size: 24576
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 174964, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1051025, size: 36864
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1028930, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 201066, size: 4096
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1055856, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 854055, size: 32768
swap_pager: indefinite wait buffer: bufobj: 0, blkno: 1082327, size: 32768
smartctl -a /dev/ada0
shows:
Code:
Error 1027 occurred at disk power-on lifetime: 23571 hours (982 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 58 ff ff ff 4f 00 36d+18:37:39.417 WRITE FPDMA QUEUED
61 00 20 ff ff ff 4f 00 36d+18:37:39.417 WRITE FPDMA QUEUED
61 00 30 ff ff ff 4f 00 36d+18:37:39.417 WRITE FPDMA QUEUED
61 00 20 ff ff ff 4f 00 36d+18:37:39.416 WRITE FPDMA QUEUED
61 00 30 ff ff ff 4f 00 36d+18:37:39.416 WRITE FPDMA QUEUED
smartctl -a /dev/ada1
shows
Code:
Error 392 occurred at disk power-on lifetime: 22009 hours (917 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 40 ff ff ff 4f 00 21d+09:42:49.392 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 21d+09:42:49.361 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 21d+09:42:49.355 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 21d+09:42:49.345 READ FPDMA QUEUED
60 00 10 ff ff ff 4f 00 21d+09:42:49.336 READ FPDMA QUEUED
uname -imor
Code:
FreeBSD 9.2-RELEASE amd64 GENERIC
pciconf -lv
Code:
ahci0@pci0:0:31:2: class=0x010601 card=0x844d1043 chip=0x1c028086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
device = '6 Series/C200 Series Chipset Family 6 port SATA AHCI Controller'
class = mass storage
subclass = SATA