Other Backblaze Drive Stats for SSD Boot Disks

Their methods have always been very dubious, but this is just completely useless - not a *single* word about how much data those drives have actually written over their lifetime, which is the _only_ indicator of the actual endurance capabilities of solid state memory.
It seems this is the same story as with their HDD 'reports' where they pick/show only data points that fit the statement they want to make.
 
Their methods have always been very dubious, but this is just completely useless - not a *single* word about how much data those drives have actually written over their lifetime, which is the _only_ indicator of the actual endurance capabilities of solid state memory.
It seems this is the same story as with their HDD 'reports' where they pick/show only data points that fit the statement they want to make.
I have always been skeptical of their "reliability" reports, as they tend to take consumer grade drives designed for single-drive home desktop systems, and cram them into a massive data center to fulfill a task not intended by the manufacturer. I suppose their data does show which drives you might be able to use to save money as opposed to buying the higher-priced enterprise drives, but as far as an overall reliability metric, too much weight is given to their data.
 
What I like about these reports is that they give you the background information and specific model numbers you need to determine for yourself if they are telling you anything useful, such as if the sample size is too small to compare to other drives, drives with X number of platters have issues, drives in this particular series have issues, etc. I would never buy most of the drives on this list but it's good to know that they're probably "fine" if I did encounter one.

The downside to the reports is when people start cherry picking data and ignoring the rest to prove their pet theories, they don't provide a comprehensive overview of all available drives, and the workload and operating environment is unusual compared to "normal."
 
I do agree, I was left with numerous questions when I saw that. The biggest as others mentioned, is that on the SSD's the explicitly mentioned it is used for boot/system device not the core data. The thing on a boot device, it generally gets written infrequently/small amounts at a time. Assuming they kept the logs on the boot/system SSD, that is still a low amount of writes done to the device. One part they didn't mention and would make this a trash comparison, on the age of the HDD. If they are comparing 10+ year old HDD's to new SSD's; that would be a unfair comparison, as any improvements done that improves the durability/lifespan of a HDD wouldn't be available. Then there is the side, the SSD's failure rate is only charted for half the HDD's are charted too. As is often mentioned on SSD's, is that they tend to fail suddenly where as HDD's tend to fail gradually. This is also hiding the actually lifespan of the devices, so it would be nice to know that SSD's only live for a few years compared to HDD's.
 
I find their reports interesting and appreciate them publishing them for free.

I think they do include caveats about their methods so they aren’t claiming to be definitive.

But yes YMMV etc. and any benchmarks etc have to be viewed warily.
 
Well, these backblaze folks give out something they have and want to give out, with no obligation to do so. The manufacturers, otoh, don't give out anything beyond marketing bs.
Now given that we don't pay backblaze, but do pay the manufacturers for their stuff, I come to think that the problem is not with backblaze...

And yes, running consumer drives in server operation is an interesting endeavour - the whole xxxxNAS (FreeBSD derived) folks seem to do just that and write tutorials about it (and they are quite good at it).
I'm doing it also. For instance, there is a ST3000DM008 drive in my server - but that drive is no longer available. What is available now is some ST3000DM007, and that one is SMR. But you don't figure that from the manual, because instead of giving the recording method as either "perpendicular" or "SMR", Seagate has decided to always give it as "TGMR" - which is always true and tells you nothing. So, definitely the problem is somewhere else.
 
Their methods have always been very dubious, but this is just completely useless - not a *single* word about how much data those drives have actually written over their lifetime, which is the _only_ indicator of the actual endurance capabilities of solid state memory.
You are correct that writing to SSDs degrades them. But it is not the only source of reliability problems. Matter-of-fact, for consumer and low impact usage (for which boot drives should qualify), write endurance is not a major source of long-term reliability of the drive.

It would be nice if Backblaze also released the figures for average data written (not super difficult, as for most SSDs SMART keeps track of it). But given their really low failure numbers (they have dozens of failures), trying to correlate those few failures with write traffic will not be statistically meaningful.

It seems this is the same story as with their HDD 'reports' where they pick/show only data points that fit the statement they want to make.
To my knowledge, they share their raw data.

The problem with their HDD reports is not that they try to spin it themselves. It's not even that they use "consumer" grade drives, lots of people do that, including a significant fraction of enterprise users (but not a significant fraction of enterprise drives). The problems with Backblaze's data include (a) it is backward-looking, so you only learn about the reliability of a drive after 3 or 5 years, when it is no longer in the market, (b) they are not a large enough customer or drive makers to be able to track the exact origin and manufacturing flow of each drive, and that makes a huge difference to using statistics to predict drive reliability, and (c) they don't have enough drives to make statistically meaningful predictions for most drive models, with a few exceptions.

Also remember: Over 90% of all disk drives used in an enterprise setting (in servers and/or data centers) are used by less than a dozen companies, the "FAANG" and friends. And those companies do not release drive reliability statistics, which they do have.
 
The main problem with these reports, when they concern SSD, is that the interior of SSD is frequently changing. There are some dozen of SSD "manufacturers", but the interior (controller and flash cells) is only produced by a small number of companies. And sadly, the model name of a device may not change when the interior does change.

This is most significant on the low end of devices:
  • The Crucial BX500 has seen a strong price reduction during the last weeks. The most likely explanation for this is that they switched from TLC to QLC. (There is a vague mention somewhere on the web that they may do this at some time.)
  • The Verbatim Vi550 has recently changed its advertized endurance from ~750 to ~450 cycles. Try to imagine why.
  • The Kingston A400 is known for constantly changing the interior. Specifically:
A400/120 GB, 5 yr. old:
Code:
  9 Power_On_Hours          -O--C-   100   100   000    -    39867
231 SSD_Life_Left           PO--C-   100   100   000    -    30
241 Lifetime_Writes_GiB     -O--C-   100   100   000    -    54246
A400/240 GB, 2 yr. old:
Code:
  9 Power_On_Hours          -O--CK   100   100   000    -    18115
231 SSD_Life_Left           ------   073   073   000    -    73
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    38159
A400/240 GB, 1 yr. old:
Code:
  9 Power_On_Hours          -O--CK   100   100   000    -    4917
231 SSD_Life_Left           ------   088   088   000    -    88
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    4579
The math:
54246/(100-30)*100/120 = endurance acc. to SMART: 645
38159/(100-73)*100/240 = endurance acc. to SMART: 588
4579/(100-88)*100/240 = endurance acc. to SMART: 158
Looks very much like these have also silently switched to QLC. Apparently no longer to be recommended.[**]

Bottomline: reliability data is mostly useless because you don't know what you will get. Or, like in the fonds investment prospects: "past earnings are no guarantee for the future"

[*] We figure endurance always as a dimensionless number of erase cycles, since that value can be compared among any devices. In contrast, the TBW can only be compared among devices of the same size, and the DWPD can only be compared within the same warranty period.
[**] A while back I mentioned here that for low end, but still solid and reliable for Unix operations, either the HP S700 or the Verbatim Vi550 look most promising to me. Soon after I bought one of the HP S700, it has now 13000 hours and makes an over-all good impression up to now. One of the Verbatim is currently ordered, let's see how that one does. (The HP S700 is the dram-less version, that's intentional because I have some ZIL on them.)
 
Kingston A400 120Gb bought on May 2019
Code:
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       16299
231 SSD_Life_Left           0x0000   010   010   000    Old_age   Offline      -       90
241 Lifetime_Writes_GiB     0x0032   100   100   000    Old_age   Always       -       10524
 
  • Thanks
Reactions: PMc
Back
Top