Hard drive reliability report

That has been discussed before, even on this forum. Backblaze uses a small number of consumer-grade disks (tens of thousands), and a tiny number of enterprise-grade disks (hundreds). Furthermore, their comparison isn't apples-to-apples, because the different disks were used in very different settings. The literature contains much more detailed studies that are based on hundreds of thousands of disks. Google is your friend, look for past proceedings from the FAST conference for a starting point; there is a paper by a gentleman from Google, and another one by a lady from U of Toronto with much better data (I'm really bad with names).

I am not claiming that Backblaze's conclusions are demonstrably wrong, only that they are not founded on solid statistical observations.
 
@ralphbsz, I think you are referring more to their comparison between enterprise and consumer drives, which was discussed here previously?

This report is comparing failure rates on the disks purely used in the storage pods, which are all consumer disks, and obviously all doing the same job. It's still an interesting read and I like how they don't just dismiss the Green/LP disks as crap (even though there are a bit); They just note that they don't use them because these disks are clearly not designed for their type of use and the aggressive power features just cause them problems.
 
Last edited by a moderator:
Exactly, I was referring to their old comparison (a few months back) of enterprise vs. consumer. If you take their data at face value, you would (wrongly) conclude that enterprise disks are just as much junk as consumer disks.

Their new data is interesting. But it has to be taken with a HUGE grain of salt. Disk failure mechanisms are complicated, and very dependent on environment (temperature, vibration, power quality, power cycle count) and workload (r/w ratio, small updates, spin down). Matter-of-fact, they hint at that, when they explain why they don't use WD green drives. The biggest factor is that Backblaze uses their own enclosures, which are very different from consumer-grade computer cases, and also quite different from enterprise-grade JBODs. They also run a very large number of drives in close physical proximity, mechanically strongly coupled (sheet metal resonator, ahem, enclosure), and consumer-type drives in particular are very sensitive to vibration, in particular sympathetic vibration while writing. While I'm not saying that this a bad thing to do (it works great for their business model, and these guys are really smart), it also means that their conclusions about reliability, in particular reliability/$, are not applicable to other users and other uses.

Exactly the same criticism applies to the data that has been published by others.
 
Granted that the complete conditions are not stated, but at least Backblaze names brands and models in the linked post above. Google's paper failed to do so, but partly made up for it by debunking some of the theories about drive activity and environmental conditions causing failures. In the short run, specific models that are failing rapidly are the most useful data. In the long run, the environmental and statistical information is far more useful. After all, who would even think of buying the Seagate 1.5TB Barracuda Green model now that bigger drives are available from all manufacturers?
 
ralphbsz said:
Their new data is interesting. But it has to be taken with a HUGE grain of salt. Disk failure mechanisms are complicated, and very dependent on environment (temperature, vibration, power quality, power cycle count) and workload (r/w ratio, small updates, spin down). Matter-of-fact, they hint at that, when they explain why they don't use WD green drives. The biggest factor is that Backblaze uses their own enclosures, which are very different from consumer-grade computer cases, and also quite different from enterprise-grade JBODs. They also run a very large number of drives in close physical proximity, mechanically strongly coupled (sheet metal resonator, ahem, enclosure), and consumer-type drives in particular are very sensitive to vibration, in particular sympathetic vibration while writing. While I'm not saying that this a bad thing to do (it works great for their business model, and these guys are really smart), it also means that their conclusions about reliability, in particular reliability/$, are not applicable to other users and other uses.
Exactly. I discussed this in some detail in my RAIDzilla II / Backblaze Pod comparison.

Uniballer said:
Granted that the complete conditions are not stated, but at least Backblaze names brands and models in the linked post above. Google's paper failed to do so, but partly made up for it by debunking some of the theories about drive activity and environmental conditions causing failures. In the short run, specific models that are failing rapidly are the most useful data. In the long run, the environmental and statistical information is far more useful.
Another thing that skews the Backplaze data is their method of acquiring disk drives - they often "farm" them (Link). One of the main influences on out-of-the-box and early failure rates is how the drives were handled before being installed. One major online retailer (N****g) seems to ship many drives with inadequate protection. And when they started asking users to buy drives and ship the drives to them, that was another shipment and another chance to for the drives to be damaged.

When I (or other OEMs) order drives from a distributor or directly from the manufacturer, they come sealed in the manufacturer's bulk packaging (Pic). That makes a big difference in the lifespan of the drives.

After all, who would even think of buying the Seagate 1.5TB Barracuda Green model now that bigger drives are available from all manufacturers?
That's the big problem with reviews that "name names" - by the time there's enough real-world experience with a drive series, the manufacturer has moved on to the next thing. And, in some cases, even drives with the same model number will have major internal differences that affect reliability - I've see drives where there are different numbers of platters for the same model, depending on date code or assembly location.

And it isn't possible to make a blanket statement that "All Brand X drives are good, while all Brand Y ones are junk" - there's been too much merger and acquisition activity to be able to make that kind of statement. As a designer of systems that go through a lot of disk drives, I base my buying decisions on the type of support I get from the manufacturer: Do I have an assigned sales rep and engineering rep? Is the warranty process reasonable? and so on.

Edited to correct minor typos.
 
Not only is everything you say correct.

Terry_Kennedy said:
And it isn't possible to make a blanket statement that "All Brand X drives are good, while all Brand Y ones are junk"

Manufacturers can go through bad periods, and good periods. It can flop back and forth. Seagate used to be on balance pretty good (when I was young and beautiful, which is a long time ago, and the comparison was Conner and Rodime), and then they had a few years of that awful sticktion problem (which was probably not even as bad as the loss to their reputation made it out to be). Then IBM SCSI drives (of the a few hundred MB size range) used to be excellent, so much so that Apple shipped them preferentially in their Mac II series. Then IBM had that problem with firmware that occasionally silently dropped writes, and their reputation went into the toilet. There was a time when Western Digital drives were all the rage as being the highest quality drives you could buy (they must have been chiseled out of blocks of solid platinum, judging by the adoration some people had for them), and today they are considered junk. Both judgements are unfair and exaggerated. When the Korean companies started making disks, everyone thought they would be junk.

Buying disk drives has to be done based either on solid data (which requires testing thousands of drives, which the average user can't do), or it has to be done based on other considerations (such as price, how easy the vendor is to deal with, guaranteed availability, and so on). Basing disk drive purchases on "reputation" means that one is following rumors.

Note: I'm not saying that all drives are the same quality. Some models clearly do better than others. But the public perception does not often reflect the facts.
 
ralphbsz said:
Manufacturers can go through bad periods, and good periods.
Absolutely. Take Fujitsu - anybody who still wants to run M2351 "Eagle" drives is running them - they just don't fail, even after 25+ years of 24x7x365 operation. I didn't even know they had a filter that was supposed to be cleaned regularly until a decade after I installed them. :O

Yet their reputation had fallen so badly by the early 1990's that I was able to leave a new-in-box 5.25" Fujitsu drive out in public view at 60 Hudson and over the span of 5 years, nobody stole it. A number of people apparently opened the box to see what was inside, and I'd occasionally find notes with derogatory comments about Fujitsu inside.

And a manufacturer can have a good and bad reputation at the same time, on different product lines. To use your Seagate example, the drives that were developed by the former MPI were great, while the ones from mainline Seagate had gone down in quality by then. Of course, there's the issue of the "old" Seagate vs. "new" Seagate corporation which confuses things even more.
 
Fujitsu Eagle - you are bringing back warm memories.

The other indestructible drive was the CDC Wren. Full-heigh form factor 5 1/4", 600 MB. I must have had dozens of them, all in individual cases with 50-pin SCSI connectors in the back, connected to a variety of minicomputers and workstations. I don't remember ever hearing of one failing. There is still one in the basement here at home, with the matching Vaxstation 3100, but it hasn't been unboxed and booted in about a decade.

Usually they were paired with Exabyte 8mm drives. Now those were the polar opposite in terms of reliability.

And you perfectly described what happened when Sweatgate (I know a few people who worked for them in Scotts Valley, they were a terrible employer in those early days, and some of their products showed how dysfunctional engineering was) bought the disk drive line from CDC / MPI / Imprimis (I forget the exact sequence of renamings); suddenly you had both high- and low-quality products with the same brand name, until the corporate cultures were meshed together.
 
ralphbsz said:
The other indestructible drive was the CDC Wren. Full-heigh form factor 5 1/4", 600 MB. I must have had dozens of them, all in individual cases with 50-pin SCSI connectors in the back, connected to a variety of minicomputers and workstations. I don't remember ever hearing of one failing.
Those were good drives, too. Most of the CDC/Imprimis/MPI/Seagate stuff in that lineage was good.

I bought DOA Sabre (8" SMD drive) dud for a few hundred dollars. It was indeed a dud, so I sent it back for a repair quote. They said the problem was inside the HDA and it was not cost-effective to open it up in the clean room. Well, always being up for a challenge, after they sent it back I took it down to the (dusty / cobwebs / etc.) basement and reasonably carefully (e.g. I didn't sneeze in it) opened it up. I found one of the internal cables unplugged and plugged it back in and put the lid back on, and the drive worked fine. I expected it to fail within a few months, but it didn't, so I eventually bought the [optional] control panel for it and put it into production (on a MicroVAX II). It ran for many, many years until that system was finally decommissioned for an unrelated reason (a MicroVAX 36xx w/ DSSI drives was cheaper to run).
 
Back
Top