While scrubing my raidz1 3 disk pool I noticed that my data that are shared there were not accessible from my samba shares anymore. I immediately logged to my server and saw that zpool reported a checksum error on the 3rd disk. I checked the logs and saw the following:
I tried to terminate the scrub without any success so I rebooted the server. When it came up I issued another scrub. The errors kept popping and the pool reported errors again. I removed the drive, inserted it in my desktop in order to examine it. Smartmon tools reported no errors. I zeroed it out and during the process no errors were displayed again. So, I decided to plug it in to the server after I removed and cleaned the SATA cables.
So far resilvering goes ok with no problems. I also plan on scrubing it again.
Could it really be a bad SATA cable that caused all the trouble ?
The system is running for more than a year with that configuration, currently at 8.2-Release.
Thanks for your input
Code:
Feb 21 20:49:19 hp root: ZFS: checksum mismatch, zpool=tank path=/dev/label/zdisk3 offset=104439013376 size=65536
Feb 21 20:49:19 hp root: ZFS: checksum mismatch, zpool=tank path=/dev/label/zdisk3 offset=104439865344 size=65536
Feb 21 20:49:21 hp kernel: ad4: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=203997424
Feb 21 20:49:32 hp kernel: ata2: SIGNATURE: 00000101
Feb 21 20:50:12 hp kernel: ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
Feb 21 20:50:45 hp su: gkontos to root on /dev/pts/1
Feb 21 20:50:53 hp kernel: ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
Feb 21 20:51:33 hp kernel: ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
Feb 21 20:52:14 hp kernel: ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
Feb 21 20:52:14 hp kernel: ad4: TIMEOUT - READ_DMA retrying (0 retries left) LBA=203997424
Feb 21 20:52:16 hp kernel: ad4: FAILURE - READ_DMA
status=ff<BUSY,READY,DMA_READY,DSC,DRQ,CORRECTABLE,INDEX,ERROR>
error=ff<ICRC,UNCORRECTABLE,MEDIA_CHANGED,NID_NOT_FOUND,MEDIA_CHANGE_REQEST,ABORTED,NO_MEDIA,ILLEGAL_LENGTH> LBA=203997424
Feb 21 20:52:26 hp kernel: ata2: SIGNATURE: 00000101
Feb 21 20:52:26 hp kernel: ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=203997680
Feb 21 20:52:29 hp kernel: ad4: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=203999088
Feb 21 20:52:39 hp kernel: ata2: SIGNATURE: 00000101
Feb 21 20:53:19 hp kernel: ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
Feb 21 20:54:00 hp kernel: ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
So far resilvering goes ok with no problems. I also plan on scrubing it again.
Could it really be a bad SATA cable that caused all the trouble ?
The system is running for more than a year with that configuration, currently at 8.2-Release.
Thanks for your input