Help to choose HD

It seems to be a personal thing and if I was to take a scientific approach . . .

There has been a datacenter that keeps track of various HDD models and has published results for the last several years.

BackBlaze HHD Stats. This is about as close as you are going to get to "science". WDC and HGST look good but a single drive failure, in a group of relatively new 4TB Red drives, may have skewed the results. Without that one failure WDC/HGST would be an obvious winner.

I use Black WDC drives at home and feel the up-charge, compared to Blues, is worth it for me
 
BackBlaze HHD Stats. This is about as close as you are going to get to "science". ...
The published BackBlaze data is the best *publicly available* information on drive reliability. There are several academic studies published in FAST conferences, but they remove the identity of the disk drives.

There is much better data available, but only within large companies that use disk drives (HP, EMC, IBM, Dell, Oracle, ...), and within the drive manufacturers themselves. The companies that use millions of drives per year do keep track of the reliability statistics rather carefully. They also track how drive reliability correlates with temperature, vibration, workload, and so on. But that information is never released to the public.

My personal statistics: Of the Seagate drives I have bought, all have failed within 5 years of use (every single one, none survived). Of the Hitachi / HGST drives, none have ever failed, and several are still in use after 8 or 10 years. With WD it's a mixed bag, some work well, some die. I have only 1 or 2 Toshiba disks in old laptops, and those were thrown away before the disks failed. Somewhere in the basement are also 30-year old 600MB and 1GB disk drives, which still function (they are only powered up once every few years); I think they were made by CDC and Fujitsu.
 
I still have the IBM 80GB HDD that came with my '98 Gateway tower, and is what I used in my pfSense box a couple years before retiring it.

It's an electricity hog, and why I retired it, but am pretty sure I could pull it out, fire it up and the HDD still be working.
 
I got a pile of 80GB WD drives for cheap some years ago. I still have one machine that uses them. Never had a failure with those.

I've used Western Digital drives in DVRs over the years and those run 24/7. The first drive I used for that purpose was the lower service life model, not sure on the label color. It started getting sketchy after about about 3 years. I've since gone to the highest service life WD drive for the DVR. I have one that's been running for about five years now.

On my main Desktop computers I'm using SanDisk SSD drives which I believe is a division of Western Digital. Though I also like Pny a lot. Haven't had any trouble with the SanDisk drives. Main reason I went with SanDisk is because of their affiliation with Western Digital. I trust that maker.
 
Seagate's consumer grade drives have always been quite bad on the average.

However, as I already wrote, I use cheap used SAS 15k drives from ebay, which are normally 5-8 years old and ran 24/7 for many years.
Of a dozen of these (enterprise grade) drives that were from Seagate I had only one failure in the last 4 years. (Thanks to ZFS it was just a matter of exchanging for another one and resilvering)
As a private user, especially when it's a single drive without redundancy, a dead drive costs much more than just $15 work time for swapping the broken drive.
So I think there is a good reason not to save a few bucks on the drive.

My laptop had a built-in consumer class Seagate drive, and when that one, shortly after warranty expired, started to exhibit minutes-long delays when accessing, I quickly got a new WD black replacement and copied the data to it (dding took an eternity due to the delays mentioned), and that still works (it's 2.5 yrs old only, though).
 
I've been using Seagate hybrid hard disks in my MacMinis (all 6 of them) for the last few years (counted back... OMG... nine years) and have only had one failure being the first drive I bought which started suffering from a couple of bad blocks last year and so was replaced before the inevitable disaster struck in May of last year. The one that failed had been running FreeBSD 24x7 for 8.5 years. As always, your mileage may vary.
 
The published BackBlaze data is the best *publicly available* information on drive reliability.

Also shepper

I think you either missed my earlier posted link or perhaps you disagree with it and aren't mentioning that. It turns out the Blackblaze data is not statistically interpreted correctly and is not useful or correct. See here: https://www.theregister.co.uk/2014/02/17/backblaze_how_not_to_evaluate_disk_reliability/ It seems like sales tricks are more popular and successful in IT than statistics, but nice to see that someone decided to speak up. The original myth lives on though.
 
Statistic can be sales tricks but it all starts with data. In the link I provided, the 4TB WD had a 8.87% Failure rate, but the number of drives tested (45) with drive days of 4113 indicates indicates that they had been used less than 100 days each. Note that the HGST 8TB drive had a similar number of drives tested and drive days but no failures.

Probably the most egregious use of statistics was by the statisticians hired by the Tobacco industry

Dell used to "Burn In" a newly order computer but I believe the term was misleading. Manufacturing defects tend to fail early, I view Dells process as less burning in and more weeding out manufacturing defects. Was the early WD failure due to poor engineering and materials or a manufacturing defect covered under warranty?
 
Was the early WD failure due to poor engineering and materials or a manufacturing defect covered under warranty?
That's a reasonable point, but when the difference is only one and half drives then it seems to me that there is not enough statistical basis to be meaningful.
 
I think you either missed my earlier posted link or perhaps you disagree with it and aren't mentioning that. It turns out the Blackblaze data is not statistically interpreted correctly and is not useful or correct. See here: https://www.theregister.co.uk/2014/02/17/backblaze_how_not_to_evaluate_disk_reliability/
It's complicated.

On one hand: Backblaze is (to my knowledge) still the only source of disk reliability statistics that's publicly available without vendor/model information having been removed. Backblaze's raw data seems trustworthy, since it would make no sense for them to forge the data. But in its blogs, Backblaze people may reach conclusions that over-interpret their raw data, by going outside the limits of good taste in statistics. I have no opinion on whether they do that or not; I look at the raw data only, and I'm capable of doing my own statistics.

On the other hand, Henry Newman's rebuttal of Backblaze's data is mostly just incorrect. To begin with, he complains that the bulk of Seagate failure's in the old Backblaze data was caused by a small number of disk models, which even Seagate admits have a hardware problem, therefore they should be ignored. But that doesn't change the (undisputed) fact that customers bought those disks, paid for them, and didn't get their money or their data back after Seagate admitted the hardware problem; and that if you calculate the average reliability of all Seagate drives, you need to include *all* Seagate drives, not exclude some that Seagate *after the fact* declared to be faulty. Then Henry Newman complains that some of these drives are over 5 years old, and he claims that "disk drives last about 5 years" (direct quote from his writing). Sorry, but that statement is nonsense; the disk manufacturers specify AFRs or MTBFs of ~1 million hours, which works out to about 112 years. If, as Henry is implying, all disks fail within 5 years, or perhaps at exactly 5 years of age, they would violate that spec by a huge margin (their MTBF would be about 45K hours, not 1M hours). But Henry's ludicrous statement contains a grain of truth: Given the progress of disk peformance/capacity, the economic lifetime of many disk drives is about 5 years; after 5 years, it becomes economically advantageous to take large disk subsystems out of production, and move the data to newer (higher capacity, lower energy/space consumption) subsystems. Then Henry talks about the bit error rate of the drive, and claims that if you use a disk long enough you will get an uncorrectable error; here he fails to distinguish between a drive failing, and it having a single uncorrectable error. Finally, Henry didn't read the Backblaze statistics carefully enough, and his complaint about 120% of drives failing is pointless, since Backblace explicitly tells us how their numbers are collected and calculated.

Backblaze is not in the business of selling disks; and in their blog they have even explained that they mostly ignore their reliability statistics themselves when making purchasing decisions. If anyone else tries to use the Backblaze data to make purchase decisions, they have to understand the data first.

Statistic can be sales tricks but it all starts with data. In the link I provided, the 4TB WD had a 8.87% Failure rate, but the number of drives tested (45) with drive days of 4113 indicates indicates that they had been used less than 100 days each. Note that the HGST 8TB drive had a similar number of drives tested and drive days but no failures.
That doesn't surprise me at all. Things like this do happen.

Anecdote from my former professional life: I was involved with shipping a product that contained several thousand disk drives, all of the same manufacturer and model (I will not disclose which manufacturer and which model, nor what the product or the customer were). Within the first few weeks of operation, we had a failure rate of roughly 10% (which for a system with that many disks is a lot of dead disks). This is for good quality enterprise disk drives from a reputable manufacturer, which had been burned in by the disk manufacturer, and then "burned in" again by the system integrator (where burnin means: a quick multi-hour test before shipping the system to the customer). We ended up replacing all the disks with product from a competing disk manufacturer. Why am I telling this story? To demonstrate that sometimes real-world problems occur that are specific to one model disk, or to a specific production batch of disks. In that sense, it does not surprise me that Backblaze observed a 8.87% failure rate of one specific batch of disks within 100 days (if it had been statistically significant); been there, done that, got the T-shirt, in a statistically significant unintentional experiment.

Dell used to "Burn In" a newly order computer but I believe the term was misleading. Manufacturing defects tend to fail early, I view Dells process as less burning in and more weeding out manufacturing defects. Was the early WD failure due to poor engineering and materials or a manufacturing defect covered under warranty?
Burn-in for disk drives is more complicated. Today's disk drives are supposed to be limited to ~550TByte of total IO in a year. At "full speed" (about 250 MByte/s for fully sequential), it takes only ~4 weeks to reach the annual limit. On the other hand, we also know that initial failure of disk drives can often take several weeks, if the failure is caused by problems with contamination, the spindle bearing, the seals of the enclosure, or the lubrication layer on the platters. So a complete burnin that is likely to get the bulk of early failures is no longer possible, without exceeding the annual workload of the disk. From this viewpoint, a systems integrator (such as Dell) no longer has the capability of performing burnin of disk drives, and simply has to trust the disk manufacturer. And as the examples above show, things can go wrong with that trust relationship.
 
I recently replaced two disks (one at a time) to my working raidz2 pool.
The new disks I installed were:
  • Seagate BarraCuda 4TB, 2 platters, 5400 RPM, 256MB, ST4000DM004
  • WD Blue 4TB, 4 platters, 5400 RPM, 64MB, WD40EZRZ
The Seagate is a Drive Managed SMR disk with the larger cache to compensate for the shingling process. But the resilvering took over 3 times more than with WD. Reading speed is still fine.
Code:
# Seagate
# resilvered 2.35T in 57h20m with 0 errors on Thu Feb 22 03:15:08 2018 (11.9 MiB/s)
dd if=/dev/ada3 of=/dev/null bs=1M
4000787030016 bytes transferred in 27562.923048 secs (145151043 bytes/sec) 138.4 MiB/s

# WD
# resilvered 2.35T in 16h55m with 0 errors on Fri Feb 23 06:54:27 2018 (40.5 MiB/s)
dd if=/dev/ada4 of=/dev/null bs=1M
4000787030016 bytes transferred in 28698.639272 secs (139406855 bytes/sec) 132.9 MiB/s
 
swegen This is an interesting comparison. The raidz parity concept shows increasing disadvantages over mirroring, the bigger and the slower the drives become.

And this becomes even worse, depending on system load. When resilvering can take weeks, the risk of another disk dying while resilvering becomes substantial.

I recently saw a chart from 2009 comparing raidz2 resilver times on arrays with different kinds of drives. The raidz2 resilver time on 600GB 7200rpm SATA consumer drives was many times larger than on 15k SAS drives. The former took up to 8hrs, the latter took about 1h 15min. Enterprise SATA drives were inbetween. This is still a bit more than about the 50-60 minutes I experience when I take out a 600GB 15k drive for stashing away as backup and resilver the mirror using another drive.
 
I once came across some data from SUN about where the sweet spot is when it comes to size and resilver time. When desaster strikes, it does so mostly when resilvering (stress for the remaining drives). For raidz2 this was about 500GB. So I built my storage server using 8 disks from two manufacturers and bought in two shops so I got different charges. Now, if a charge/type fails together, all is well. Speed is not that important as the limit is the connection anyway.
 
Back
Top