Solved WD Green read failure?

Hello everybody,

just a short question, please advise. I've got 4 practically unused WD Green drives I want to utilize in a home server (meaning although they are 7 years old they were sitting ducks, apart from a few days of usage, in the unused server). Now, I just run several smartctl (short) tests and all drives are fine but one:

# smartctl -A /dev/ada3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 174 148 021 Pre-fail Always - 8258
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 107
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1176
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 93
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 75
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 882
194 Temperature_Celsius 0x0022 119 112 000 Old_age Always - 33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
[B][COLOR=rgb(184, 49, 47)]197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 134[/COLOR][/B]
[COLOR=rgb(184, 49, 47)][B]198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 116[/B]
[B]199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0[/B][/COLOR]
[B][COLOR=rgb(184, 49, 47)]200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 175[/COLOR][/B]


Likewise:

# smartctl -l selftest /dev/ada3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 1176 292201402
# 2 Short offline Completed: read failure 90% 1176 292201402
# 3 Short offline Completed: read failure 90% 1176 292201405
# 4 Short offline Completed: read failure 90% 1176 292201406
# 5 Short offline Completed: read failure 90% 1176 292201400
# 6 Short offline Completed: read failure 90% 1155 292201406
# 7 Short offline Completed: read failure 90% 1155 292201400
# 8 Extended offline Completed: read failure 90% 1154 292201402
# 9 Short offline Completed: read failure 90% 1154 292201406
#10 Short offline Completed: read failure 90% 1154 292201407
#11 Short offline Completed: read failure 90% 1154 292201400


What can I do or rather, what should I do?
 
Well, I think it is obvious that you can not trust that drive. I would also reconsider the use of Green drives on a RAID.
 
Yes, yes, I know about it. E.g. I already set the parking cycle hack on them. In addition, this read failure thing pushes me further towards buying HGSTs or WD REDs. Or suchlike. :)

But can I "cure" this problem somehow? Or should I say "circumvent"... :) I'm referring to such command as:

# dd if=/dev/zero of=/dev/ada3 conv=sync bs=4096 count=1 seek=36525175

Taking the LBA 292201402 as base. Hm?
 
Would you consider to fix your car's brake discs by a software update? I guess not.
The problem is more in your head, no offense.
If I were in your position, I'd simply take a hammer and destroy the said harddrive.
Then there's no way back and your problem is gone.
 
None taken - but would I? Depends. In my car? No way. In my mother-in-law's one? Sure thing! ;)

And I like your down-to-earth approach. A bit prehistoric but efficient nevertheless. :p
 
Most likely, that drive has serious problems. Given its age, I would suspect that it is a hardware problem involving platters and heads. If you had the right kind of equipment, you should put the head under an electron microscope and look for lubricant and oxide contamination on it, and you should put the platter on a testing machine (a disk drive with specialized heads) and map out the slight scratches on it. Clearly, you will not do that at home, since the equipment costs millions. Clearly, WD is not going to do that for you, since (a) the drive is long out of warranty, and (b) you are not a large customer who gives them many millions of $ of business per year.

By the way, I've seen WD do this for defective drives, but (a) the drives were brand new, (b) I was working for a customer who does give them many M$ per year.

My only real suggestion: Overwrite the whole disk once with zeroes, using dd: dd if=/dev/zero of=/dev/adaXX bs=1048576, and let it run until the end of the disk (will take a few hours). If you get lucky, the problems were only caused by a small number of defective areas on the platter, and the drive was able to re-vector these areas (move the data elsewhere) while writing. An even better solution would be to perform a low-level format, but I don't know off-hand how to do it on SATA disks; on SCSI disks, you use the sg_utils package sg_format command, and you spend an hour reading the SCSI standards document to decide what parameters to use.

In practice, I fear this drive will be in the trash can soon. Sadly, it belongs there.
 
Wow. Just wow. Rest in peace, little buddy. Ftp... ssh, it's okay. You won't feel anything... :sssh:

Or, as k.jacker would put it, rest in piece(s)! :cool:

Thank you for the advice, ralphbsz, I'll give dd a try but of course won't trust any of my data on it.
 
So I did it:

# dd if=/dev/zero of=/dev/ada3 bs=1048576
dd: /dev/ada3: short write on character device
dd: /dev/ada3: end of device
2861589+0 records in
2861588+1 records out
3000592982016 bytes transferred in 31536.005993 secs (95148161 bytes/sec)


And then the short test:

# smartctl -t short /dev/ada3

...and lo and behold:

# smartctl -l selftest /dev/ada3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
[B][COLOR=rgb(65, 168, 95)]# 1 Short offline Completed without error 00% 1220 -[/COLOR][/B]
# 2 Extended offline Completed: read failure 90% 1181 610017072
# 3 Short offline Completed: read failure 90% 1181 610017065
# 4 Short offline Completed: read failure 90% 1181 610017056
# 5 Short offline Completed: read failure 90% 1181 610017048
# 6 Short offline Completed: read failure 90% 1181 610017048
# 7 Short offline Completed: read failure 90% 1176 292201402
# 8 Short offline Completed: read failure 90% 1176 292201402
# 9 Short offline Completed: read failure 90% 1176 292201405
#10 Short offline Completed: read failure 90% 1176 292201406
#11 Short offline Completed: read failure 90% 1176 292201400
#12 Short offline Completed: read failure 90% 1155 292201406
#13 Short offline Completed: read failure 90% 1155 292201400
#14 Extended offline Completed: read failure 90% 1154 292201402
#15 Short offline Completed: read failure 90% 1154 292201406
#16 Short offline Completed: read failure 90% 1154 292201407
#17 Short offline Completed: read failure 90% 1154 292201400


Needless to say this doesn't make this disk the very foundation of the backup of our National Archive but hey, don't get greedy, I guess... thanks again, ralphbsz! :D
 
Back
Top