Western Digital Self Destruction?

I found a frightening issue about the IntelliPower firmware on Western Digital hard disks. The point is this: WD's IntelliPower parks the disk heads after eight seconds of inactivity, which leads to an exorbitant count of park-ins and outs, reducing significantly the life of the HDD (the OP says down to 1.5 years). Moreover these disks have major problems under Linux, which checks whether the heads are parked every ten seconds.

The disks have a very good price though. Does anybody have an experience with IntelliPower under FreeBSD?

For those understanding German, [link=http://www.et-view-support.com/Forum/showthread.php?1568-Western-Digital-Selbstzerst%F6rung-mit-IntelliPower!!!]here is the original topic[/link].
 
I've never had a WD HDD last more than a month beyond its warranty, except for the ones I don't use. So this doesn't seem surprising to me.
 
Re: Western Digital Self Destruction !?

sysutils/ataidle has a command that can tell the drive not to do that ridiculous head-parking dance. But some drives ignore the command.
 
You can disable or tune this "function". There is the wdidle3.exe tool from WD, some open source tools like http://idle3-tools.sourceforge.net/ and also smartctl can be used to set/disable timeout temporary after restart.

However, based on both own and customers experience (thousands of units), cheap WD drives dead most. Funny but significant addition - two new sealed WD disks unpacked as first from box returned as warranty reclaim have been dead on arrival, several others still wait for such discovery :) Seagate usually send disks repaired/refurbished, but always in working condition.
 
I've had limited personal experience, but the WD Green drives, which I believe are usually the least expensive, have been pretty bad for me. Out of perhaps four bought, all four went bad within six months of end of warranty. Conversations with co-workers indicated that they'd had the same experience.
 
We've always bought Seagate disks, this is why I have zero experience with WD. Backblaze however [link=http://blog.backblaze.com/2013/02/20/180tb-of-good-vibrations-storage-pod-3-0/]swear on Hitachi[/link] based on the failure rates of their 25'000 disks.
 
Re: Western Digital Self Destruction !?

wblock@ said:
sysutils/ataidle has a command that can tell the drive not to do that ridiculous head-parking dance. But some drives ignore the command.

I recently installed a 1 TB WD Red in my desktop machine. It's used for storage and not the main drive, but is automounted. Would it be worth my while to use the above utility? Does FreeBSD not access the drive unless I request a file on it?
 
I've never found any particular problem with WD drives (not in the last few years at least). Unfortunately people seem to try and pretend there's no difference between cheap and expensive disks. People will buy 2 TB Green disks for their NAS thinking they've got a bargain because they're cheaper than 1 TB NAS disks.

The green disks are good only for a cheap workstation with low (non 24x7) use, which is why they tend to find their way into cheap, off-the-shelf desktops. If you're a heavier user and/or plan to leave the computer on permanently you'll be much better off with red or black series, which I've always had good results with. You may alleviate some issues by removing the idle-park feature, but really you should buy a disk aimed at the purpose you intend to use it for (whether it's WD or not).
 
I also thought this way until I read [link=http://blog.backblaze.com/2013/12/04/enterprise-drive-reliability/]this blog post here[/link]. These are real life stats!

Anyway, it would be nice to have a tutorial or a HOWTO focusing on FreeBSD and HDDs with all the best practices and tools available - starting from raw disk analysis and performance tuning.
 
As mentioned above, It appears Backblaze use Hitachi disks (although it doesn't actually mention the disks used in their enterprise vs consumer) blog. There's probably still quite a difference between the disks Backblaze chose to use for the pods, and the cheapo Green disks. In fact if you look at the first link to Backblaze above, they list Hitachi and WD RED as their choice of disk, along with Seagate that they conceed has a higher failure rate. They probably didn't even bother trying to use the Green disks.

Obviously disks fail and apparently enterprise disks fail just as much as a good consumer disk. Of course with enterprise disks it's not just reliability people use them for, there's also the longer warranty so those failed disks get replaced (as mentioned by Backblaze), lower latency (higher RPM), better handling of vibration so there's less fluctuation in access times, full duplex SAS, dual port SAS, etc, etc. Backblaze don't really care about most of that as their systems aren't used for 'live' data.

My simple view on this would be if you want a cheap as possible disk, buy a Green and if you really only need a cheap disk, it'll probably work fine for you. If you're a more serious user, get something more suitable. And if you're running an enterprise system hosting critical real-time data, get enterprise disks. Buying the cheapest disk you can then complaining when it doesn't work well for 24x7 NAS (or similar) use is a bit like buying a cheap little car then finding it doesn't work well on the track.
 
Sure, you are right. The bottom line I draw from their post is however: trust your RAID-Z(2,3) and not your enterprise drive.
 
Well yeah, you can't trust any disk. No enterprise user in their right mind would put data not on RAID, regardless of whether the disks are enterprise or not. It's why it always amuses me when people run a RAID 0 array to get more performance (that they probably don't need) or a single disk and end up on here asking for help when it fails.

Backblaze are in a nice position where they only store backup data sent over the Internet, using their own software, so it's fairly easy for them to store the data on multiple independent systems and let their software deal with pods being unavailable. They don't need serious performance or any other enterprise features, just boxes full of big disks. It's the Google style solution of running software over lots of cheap boxes designed such that failures have no impact to the service and can be restored easily.

You can't fully trust RAID-Z either. There's enough examples just on here of pools that are suddenly faulted or have 'corrupt meta-data'. Always have an independent backup.
 
vanessa said:
I also thought this way until I read [link=http://blog.backblaze.com/2013/12/04/enterprise-drive-reliability/]this blog post here[/link]. These are real life stats!
They show a 0.4% difference in failure rate between consumer and enterprise drives. That's seven days in five years. How significant is that?

Anyway, it would be nice to have a tutorial or a HOWTO focusing on FreeBSD and HDDs with all the best practices and tools available - starting from raw disk analysis and performance tuning.
I'll second that. There's lots of information on partitioning and the like, but reliability and longevity is not generally mentioned.
 
They show a 0.4% difference in failure rate between consumer and enterprise drives. That's 7 days in 5 years. How significant is that?

I think that was directed at me, and it's very significant if your aim is to show there's very little difference in failure rate between consumer and enterprise disks.

My response was that failure rate is not a primary reason for using enterprise disks, and that the report was using 'good' consumer disks rather than those specifically designed and targeted at the cheap, low use market.
 
And here's a thread which I heavily appreciate, thanks @vanessa for starting this. Thing is; I'm behind my laptop right now (which types hardly as good as my regular keyboard) due to a broken SATA controller on my Asus board combined with a broken down Western Digital SATA harddisk. Fortunately in a RAID 1 setup so no data was lost.

Well, I came to the same conclusion; it seems that WD harddisks do indeed seem a lot less reliable than other brands. Alas, thanks again.
 
Last edited by a moderator:
vanessa said:
I also thought this way until I read [link=http://blog.backblaze.com/2013/12/04/enterprise-drive-reliability/]this blog post here[/link]. These are real life stats!

Well, they are stats. Are the "real life", or some toy imitation? A few caveats:
  • Backblaze's large system has 25K drives. That's the number of drives that is installed in a single compute cluster in large installations (weather forecasting agencies, supercomputer centers, corporate data centers, and non-existing agencies). There are thousands of such large compute clusters. Backblaze is nice enough to publish their statistics, and they deserve kudos for that. But in terms of overall drive count, they are a tiny backwater.
  • The really large data centers in the world are run by the likes of Google and Facebook. Google has actually published research papers on disk drive reliability, with a lot of detail (please look at proceedings from FAST and MSS conferences). All Backblaze gives us is the overall failure rate, without breakdown on who/what/when/how/...
  • Backblaze's statements about enterprise-grade drives are particularly laughable. They are based on HUNDREDS of drives. That's what ships in a single EMC/Hitachi/IBM/HP/Dell/... storage server, and large customers have dozens or hundreds of these servers. Statistics based on such a small number of drives are not helpful.
  • They compare apples and oranges. Why? Failure rates of drives depend crucially on vibration, temperature, workload, and power supply stability (pretty much in that order). We know that Backblaze's own storage enclosure (the 45-disk SATA enclosure they have engineered) is pretty good about vibration control, but far from perfect. I personally don't know about how well they control temperature, and how good their power supplies are (but other people may know). It's silly to compare drives in one environment (the Backblaze enclosure) with drives in a totally different environment (Dell or EMC enclosures), and think that the conclusions apply to the drives themselves.
  • When is a disk "dead"? That depends crucially on definition. We would all agree that a disk is dead if you can't read from it at all (either it refuses to spin, or you can not electronically communicate with it). But how about a disk that is readable, but has lots of read errors? How about one that has exhausted its spare space for revectoring writes, and is readable but de-facto not writeable? How about one that is readable, but is getting so many internally corrected errors that its performance suffers greatly? How about a disk that has reported impending failure via SMART or a similar mechanism, but is still functioning (at least for the moment)? Before you compare failure rates of disks, you have to specify what you mean by "failure". Using different definitions, the answers can easily vary by a factor of 2 or 5.

While I don't disagree with Backblaze's statistics, and personally their conclusion seems plausible and probably has a large grain of truth in it, it is not the end-all of disk reliability analysis.

Anyway, it would be nice to have a tutorial or a HOWTO focusing on FreeBSD and HDDs with all the best practices and tools available - starting from raw disk analysis and performance tuning.

Here are a few hints. Start by buying a disk. While there are lots of horror stories on the web (even including in this thread), please note that disks from all major vendors are used by all major system builders. For example, several posts above crucified WD disks. If they were right, then why would the likes of IBM/EMC/Dell/... sell WD disks in their storage servers? If WD disks really were as bad as people make them out, why hasn't WD been killed by warranty return costs? My assertion is that while there are differences in quality and reliability between drive vendors, they are minor, and hard to detect without looking at large aggregate numbers of disks.

Remember that there are significant hardware differences between high-performance enterprise disks (10K and up RPM), near-line enterprise disks (slow-spinning), and consumer SATA disks. The cost difference between these drives is to some extent caused by the different engineering that goes into these drives, and the different bill-of-materials. There is a very good (but dated) paper by Riedel and Anderson on that topic. Anyone who thinks that consumer drives and enterprise drives will have the same performance and reliability is fooling themselves. Having said that, I only use consumer-grade drives at home, but then I don't store mission-critical data (baby pictures and scanned electrical bills don't qualify), and I'm incredibly good about backups.

Having bought a drive, the best thing you can do to make it run for a long time is to treat it well. Make sure it is NOT exposed to vibration. Put the fans in your computer case on vibration mounts, and the disk on vibration mounts. If you have more than one disk in close proximity, make sure seeks on one drive don't shake its neighbor. Some cases (the more expensive ones) are designed to have rubber grommets the disks are mounted in. Make sure the sheet metal of the case doesn't turn into a musical instrument. Strips of foam rubber or cork, glued in the right places, can do wonders for suppressing vibration.

Next, please cool your drives. Disk drives like medium temperatures; current wisdom seems to think that around 30 to 35 degrees is good. If your computer runs really hot, please add a case fan (I have one enclosure where hundreds of disks run in the '50s, and they're dying like flies). On the other hand, many drives don't like very cold temperatures either; there are horror stories around of farms with ten thousands of drives, barely functioning because they were cooled to about 10 degrees (C, not F), and the drives having to recalibrate all the time. You can use SMART to read the drive temperature.

If you have a workload that is incredibly seek-intensive (many random accesses), consider moving part of that to an SSD, or combating the seeks by giving the system more cache RAM (but beware, SSDs are not a cure-all, and have their own reliability issues). While seeks in and of themselves are not harmful, every write after a long seek increases the risk of off-track writes and actuator miscalibration. And the thing with disks is to avoid risk.
 
vanessa said:
I also thought this way until I read [link=http://blog.backblaze.com/2013/12/04/enterprise-drive-reliability/]this blog post here[/link]. These are real life stats!

Anyway, it would be nice to have a tutorial or a HOWTO focusing on FreeBSD and HDDs with all the best practices and tools available - starting from raw disk analysis and performance tuning.

For a new drive, I run the SMART short test with smartctl -tshort /dev/ada1. If that succeeds, then I run the long test. If that succeeds, I may just fill the drive with dd(1), which is not greatly different but repeats testing the whole disk: dd if=/dev/zero of=/dev/ada1 bs=128k.

For performance, there is not a lot to do. Bigger drives have 4K blocks, and partitions should be aligned to them. That is built into the commands used in this article: Disk Setup On FreeBSD.
 
usdmatt said:
They show a 0.4% difference in failure rate between consumer and enterprise drives. That's 7 days in 5 years. How significant is that?

I think that was directed at me, and it's very significant if your aim is to show there's very little difference in failure rate between consumer and enterprise disks.

It was directed at @vanessa's comment which I took to mean that there could be a big difference between consumer and enterprise drives. Perhaps that was not the intention. I agree with you that 0.4% is indeed significant in showing that there is little difference.

My response was that failure rate is not a primary reason for using enterprise disks, and that the report was using 'good' consumer disks rather than those specifically designed and targeted at the cheap, low use market.

Agreed. :)
 
Last edited by a moderator:
wblock@ said:
For performance, there is not a lot to do. Bigger drives have 4K blocks, and partitions should be aligned to them. That is built into the commands used in this article: Disk Setup On FreeBSD.

When you say "performance", I assume that you are talking about longevity. Access and transfer times, etc, are a different issue than what we're talking about in this thread. Obviously excessive parking, as may be the case with the WD technology, is an issue and it may be a good idea to use sysutils/ataidle on those drives. I'm still wondering though, does FreeBSD do frequently access the drive even if a file is not requested by the operator? If it doesn't, then what purpose would sysutils/ataidle serve?
 
No, I was talking about speed. If you want a drive to live longer, don't buy the cheapest model, and don't subject it to vibration. Otherwise, there's not much you can do beyond using some form of RAID to protect against individual drive failure and backups to protect against data loss.
 
wblock@ said:
No, I was talking about speed. If you want a drive to live longer, don't buy the cheapest model, and don't subject it to vibration. Otherwise, there's not much you can do beyond using some form of RAID to protect against individual drive failure and backups to protect against data loss.
Thanks, but I'm still confused. The OP (@vanessa) mentions IntelliPower like this:
WD's IntelliPower parks the disk heads after 8 seconds of inactivity, which leads to an exorbitant
count of park-ins and outs, reducing significantly the life of the HDD (the OP says down to 1.5 years)

So, regarding "cheapness", I note that Western Digital Black and Red drives are advertised as having the IntelliPower "feature". I am then surmising that one could prolong the life of a WD Black by using the aforementioned utility. However, I've found some more information.

It seems that the WD Red 1 TB drives (one of which I just installed) doesn't change speed. One tester used audio profiles to confirm that the drive runs at 5400 rpm. at all times. Another tester mentions that the "infamous Load Cycle Count (LCC) issue", presumably what the OP was talking about, does not exist in these drives - or at least the one he tested. In fact it would seem that there is much confusion about the IntelliPower drives. WD says "IntelliPower - A fine-tuned balance of spin speed, transfer rate and caching algorithms designed to deliver both significant power savings and solid performance." I'm not really sure (is anybody?) what that means though. In any case, if that tester is to be believed and I have a similar issue of drive then I don't have to worry about the so called LLC issue - despite what the OP's reference article says about IntelliPower.

Here is the analysis of the WD10EFRX to which I refer above. He also had a message for us here:
I did see better overall performance using FreeBSD (sequential throughput was about 12MBytes/sec higher), so your mileage may vary.
 
Last edited by a moderator:
It's not the same on all drives. I have not seen excessive head parking on WD Red drives, or Black desktop drives, but have seen and heard it on Scorpio Black notebook drives.
 
Sorry for being away for a couple of days. IntelliPower is actually just an example. Next year another manufacturer will tune the HDDs firmware and give it a new buzz word. It would be good to have a workflow or a best practice of how to take an unknown disk and analyse its behaviour.

Here is another tip: There is a vast number of various test (mainly from the manufacturers themselves) on the [link=http://www.ultimatebootcd.com]Ultimate Boot CD[/link]. Very handy :i
 
I can at least confirm some of the WD Scorpio Blue (WD2500BEVT) but also on an older Seagate Momentus 5400.3 (ST9120822AS) were "tuned" for low-power-but-quick-self-destruction. Both came out of laptops whereas the WD drive was replaced on day 1 for an SSD on that particular notebook and thus had a very low load_cycle count. However even using wdidle3.exe (which confirmed to change values over reboots and power cycling!) the WD drive kept spinning down consistently after 8-10 seconds like the Seagate. Both drives quickly had load_cycle count > 10k within 1-2 weeks and that was also annoying since the little box was lagging due to the disk spinning up on almost every disk access.

Without trying sysutils/ataidle I came across a camcontrol command that I found googling across FreeBSD lists which disables power management but doesn't seem to be persistent over reboot (with the advantage that it doesn't mess around with the drive firmware). Since the 2 drives were anyway going to be replaced for 2.5" WD Reds (WD10JFCX) I gave it both a try with /sbin/camcontrol cmd ada0 -a "EF 85 00 00 00 00 00 00 00 00 00 00"

I put that line inside my rc.local so it gets executed during every powerup made a little cron job to query the load_cycle count every 5 minutes.

Code:
/usr/local/sbin/smartctl -a /dev/ada0 | grep Load_Cycle

After that it the load_cycle count kept stable except for regular power cycling. :) I put a WD Red in that box to compare with and I can confirm that this particular model doesn't to show that (sorry to say) stupid behaviour as the other more "laptop-optimized" disks did.
 
Back
Top