Disk Read Error GEOM_ELI, ATA status 51, error: 40

Hello,

I am not clear on where the error is coming from, therefore I'm uncertain of where to post this.

It started about 2 weeks ago with a daily ZFS snapshot getting stuck around midnight, I believe it was taking a snapshot of a filesystem on ada5, which is part of a mirror with ada4 encrypted with geli(). The next day the computer froze and I had to power off the system. I had to reboot three times for the boot drive to be found. Once it started to reboot I received the following error message repeatedly:

Code:
(ada5:ata5:0:0:0): Retrying command
(ada5:ata5:0:0:0): READ_DMA48. ACB: 25 00 60 6d 76 40 42 00 00 00 08 00
(ada5:ata5:0:0:0): CAM status: ATA Status Error
(ada5:ata5:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
(ada5:ata5:0:0:0): RES: 51 40 67 6d 76 42 42 00 00 00 00
(ada5:ata5:0:0:0): Retrying command
(ada5:ata5:0:0:0): READ_DMA48. ACB: 25 00 60 6d 76 40 42 00 00 00 08 00
(ada5:ata5:0:0:0): CAM status: ATA Status Error
(ada5:ata5:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
(ada5:ata5:0:0:0): RES: 51 40 66 6d 76 42 42 00 00 00 00
(ada5:ata5:0:0:0): Error 5, Retries exhausted
GEOM_ELI: g_eli_read_done() failed ada5p2.eli[READ(offset=527958720512, length=4096)]

I had another stuck ZFS snapshot a few days ago on ada5 directory, but I was able to shut the computer down before there were problems. Upon reboot I received the same message as above.

smartd reports no errors with any of the drives. zpool status -x also reports no errors and no full filesystems, the same with df -h.

Yesterday, while editing a file in Emacs the user space, KDE, froze with a screeching noise from the speaker. I was not able to do anything in the user space and, since I had no other device to SSH in with, I did a hard power cycle again. Upon reboot, the same issues as above and the file that I had been working with in Emacs had been overwritten and was 0kB in size. I am unclear if these things are related, as I had similar issues in the distant past with Emacs that seemed to go away with an update of the port.

Essentially, I am looking for help on trying to figure out what is wrong with the disk, geli or ZFS here. This is my workstation and I have a grant due in the next week. I cannot afford to upgrade anything as sometimes upgrades don't work.

All the filesystems, including the boot drive, are ZFS. The output of camcontrol devlist:
Code:
<Samsung SSD 840 PRO Series DXM06B0Q>  at scbus1 target 0 lun 0 (ada0,pass0)
<Samsung SSD 840 EVO 1TB EXT0BB6Q>  at scbus2 target 0 lun 0 (ada1,pass1)
<Samsung SSD 840 EVO 1TB EXT0BB6Q>  at scbus3 target 0 lun 0 (ada2,pass2)
<LaCie 2Big Quadra USB3 0301>  at scbus7 target 0 lun 0 (pass3,da0)
<WDC WD3000HLFS-01G6U4 04.04V06>  at scbus8 target 0 lun 0 (ada3,pass4)
<WDC WD1001FALS-00E8B0 05.00K05>  at scbus8 target 1 lun 0 (ada4,pass5)
<WDC WD1001FALS-00J7B0 05.00K05>  at scbus9 target 0 lun 0 (ada5,pass6)
<Optiarc DVD RW AD-7241S 1.03>  at scbus9 target 1 lun 0 (cd0,pass7)

And my system:

Code:
FreeBSD freeenv 10.1-RELEASE-p19 FreeBSD 10.1-RELEASE-p19 #0: Sat Aug 22 03:55:09 UTC 2015  root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

Thanks very much for any suggestions.

Aric
 
Last edited by a moderator:
That is a hardware error. UNC means uncorrectable data error. Either you have a cable going bad or that FALS-00J7B0 drive is failing.
As to why smartd and ZFS both think it's fine, maybe it's just intermittent at this stage, but I'd look at trying a new cable/replacing that drive.
 
SMART is not a completely reliable indicator if the disk is faulty or not, if it tells you that there are no errors on the disk that is not evidence that the disk is all good. The other way is of course different, if SMART shows errors on the disk it is probably worth taking that seriously and replace it asap.
 
Back
Top