Solved WD Green read failure?

sunbird · Nov 27, 2018

Hello everybody,

just a short question, please advise. I've got 4 practically unused WD Green drives I want to utilize in a home server (meaning although they are 7 years old they were sitting ducks, apart from a few days of usage, in the unused server). Now, I just run several smartctl (short) tests and all drives are fine but one:

 # smartctl -A /dev/ada3

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org



=== START OF READ SMART DATA SECTION ===

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   174   148   021    Pre-fail  Always       -       8258

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       107

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1176

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       93

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       882

194 Temperature_Celsius     0x0022   119   112   000    Old_age   Always       -       33

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

[B][COLOR=rgb(184, 49, 47)]197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       134[/COLOR][/B]

[COLOR=rgb(184, 49, 47)][B]198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       116[/B]

[B]199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0[/B][/COLOR]

[B][COLOR=rgb(184, 49, 47)]200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       175[/COLOR][/B]

Likewise:

 # smartctl -l selftest /dev/ada3

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org



=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed: read failure       90%      1176         292201402

# 2  Short offline       Completed: read failure       90%      1176         292201402

# 3  Short offline       Completed: read failure       90%      1176         292201405

# 4  Short offline       Completed: read failure       90%      1176         292201406

# 5  Short offline       Completed: read failure       90%      1176         292201400

# 6  Short offline       Completed: read failure       90%      1155         292201406

# 7  Short offline       Completed: read failure       90%      1155         292201400

# 8  Extended offline    Completed: read failure       90%      1154         292201402

# 9  Short offline       Completed: read failure       90%      1154         292201406

#10  Short offline       Completed: read failure       90%      1154         292201407

#11  Short offline       Completed: read failure       90%      1154         292201400

What can I do or rather, what should I do?

gkontos · Nov 27, 2018

Well, I think it is obvious that you can not trust that drive. I would also reconsider the use of Green drives on a RAID.

sunbird · Nov 27, 2018

Yes, yes, I know about it. E.g. I already set the parking cycle hack on them. In addition, this read failure thing pushes me further towards buying HGSTs or WD REDs. Or suchlike.

But can I "cure" this problem somehow? Or should I say "circumvent"...

I'm referring to such command as:

# dd if=/dev/zero of=/dev/ada3 conv=sync bs=4096 count=1 seek=36525175

Taking the LBA 292201402 as base. Hm?

k.jacker · Nov 27, 2018

Would you consider to fix your car's brake discs by a software update? I guess not.
The problem is more in your head, no offense.
If I were in your position, I'd simply take a hammer and destroy the said harddrive.
Then there's no way back and your problem is gone.

sunbird · Nov 27, 2018

None taken - but would I? Depends. In my car? No way. In my mother-in-law's one? Sure thing!

And I like your down-to-earth approach. A bit prehistoric but efficient nevertheless.

k.jacker · Nov 27, 2018

I hoped you'd like it

And yes, I happen to use prehistoric methods to solve such "should I, or not..." situations

ralphbsz · Nov 27, 2018

Most likely, that drive has serious problems. Given its age, I would suspect that it is a hardware problem involving platters and heads. If you had the right kind of equipment, you should put the head under an electron microscope and look for lubricant and oxide contamination on it, and you should put the platter on a testing machine (a disk drive with specialized heads) and map out the slight scratches on it. Clearly, you will not do that at home, since the equipment costs millions. Clearly, WD is not going to do that for you, since (a) the drive is long out of warranty, and (b) you are not a large customer who gives them many millions of $ of business per year.

By the way, I've seen WD do this for defective drives, but (a) the drives were brand new, (b) I was working for a customer who does give them many M$ per year.

My only real suggestion: Overwrite the whole disk once with zeroes, using dd: dd if=/dev/zero of=/dev/adaXX bs=1048576, and let it run until the end of the disk (will take a few hours). If you get lucky, the problems were only caused by a small number of defective areas on the platter, and the drive was able to re-vector these areas (move the data elsewhere) while writing. An even better solution would be to perform a low-level format, but I don't know off-hand how to do it on SATA disks; on SCSI disks, you use the sg_utils package sg_format command, and you spend an hour reading the SCSI standards document to decide what parameters to use.

In practice, I fear this drive will be in the trash can soon. Sadly, it belongs there.

sunbird · Nov 28, 2018

Wow. Just wow. Rest in peace, little buddy. ~~Ftp~~... ssh, it's okay. You won't feel anything...

Or, as k.jacker would put it, rest in piece(s)!

Thank you for the advice, ralphbsz, I'll give dd a try but of course won't trust any of my data on it.

sunbird · Nov 29, 2018

So I did it:

 # dd if=/dev/zero of=/dev/ada3 bs=1048576

dd: /dev/ada3: short write on character device

dd: /dev/ada3: end of device

2861589+0 records in

2861588+1 records out

3000592982016 bytes transferred in 31536.005993 secs (95148161 bytes/sec)

And then the short test:

# smartctl -t short /dev/ada3

...and lo and behold:

 # smartctl -l selftest /dev/ada3

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org



=== START OF READ SMART DATA SECTION ===

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

[B][COLOR=rgb(65, 168, 95)]# 1  Short offline       Completed without error       00%      1220         -[/COLOR][/B]

# 2  Extended offline    Completed: read failure       90%      1181         610017072

# 3  Short offline       Completed: read failure       90%      1181         610017065

# 4  Short offline       Completed: read failure       90%      1181         610017056

# 5  Short offline       Completed: read failure       90%      1181         610017048

# 6  Short offline       Completed: read failure       90%      1181         610017048

# 7  Short offline       Completed: read failure       90%      1176         292201402

# 8  Short offline       Completed: read failure       90%      1176         292201402

# 9  Short offline       Completed: read failure       90%      1176         292201405

#10  Short offline       Completed: read failure       90%      1176         292201406

#11  Short offline       Completed: read failure       90%      1176         292201400

#12  Short offline       Completed: read failure       90%      1155         292201406

#13  Short offline       Completed: read failure       90%      1155         292201400

#14  Extended offline    Completed: read failure       90%      1154         292201402

#15  Short offline       Completed: read failure       90%      1154         292201406

#16  Short offline       Completed: read failure       90%      1154         292201407

#17  Short offline       Completed: read failure       90%      1154         292201400

Needless to say this doesn't make this disk the very foundation of the backup of our National Archive but hey, don't get greedy, I guess... thanks again, ralphbsz!