Other How to make SSDs not better

PMc · Nov 13, 2019

The Internet nowadays is full with millions of texts about the life-span of SSDs, while indeed none of them gives any useful information, and most are outright crap. Such stuff a priest could talk for the sermon, without giving any answers.
And the manufacturers are no better. I'm not interested in TBW values where nobody knows to which size of drive they would apply. I'm even less interested in throughput speeds, as they don't vary much, and don't have much impact.
I'm only interested in one single number: how many writes? Strangely, nobody gives that number. I would like to know exactly why I might pay X times the price for an enterprise class piece, and exactly what I would get as added value.

I take it as granted that a wear-levelling algorithm does achieve all cells are written about equally often. That is no magic, that's just a finite algorithm. So, given the quality of the chips, and the overhead of that algorithm, there must result that one single number N: how often can the amount of capacity be written to the drive? But what I get instead of that number is marketing babble of the worst kind. I don't want to know how many GB an "average user" will write per day, as the statistics from my machine do already tell me that.
I basically don't want manufacturers telling me lots of assumptions about my behaviour instead of telling me facts about their product's behaviour!

Some five years ago I bought my first small SSD, just out of curiosity and for testing. A few weeks later, without having done much, the piece was dead-in-the-water. Nicely, I got it replaced by double capacity (and that replacement still works today). I put it in the desktop as a disk caching, and didn't notice any improvement by that - might be that a desktop does not do much repeated read: program startup is initial read, and then most of the important things stay in memory.
So, later I put it into my old server machine - and that made an improvement. It is not that the SSD would read faster (because that machine does not even reach the throughput speed of a spinning drive). It is the absence of seek times that does kick ass: Even modern spinning drives will have a track-to-track seek time >1ms, and that means that at every track change at least 2 to 20 millions of cpu instructions are wasted in busy-waiting. (Unless you do bitcoin mining in parallel, but then also your tasks will not accomplish faster.)

So, my strategy became to leave the big media files that are sequentially read on the spinning drives, and put the small and mostly fragmented data on SSD: OS installations, mailboxes, databases, web caches, ... Unfortunately that is also the data that is most often written, so the write counts on my drives get much higher than the read counts, and therefore wear is a concern.

Now lets look at the details: here are two drives, apparently same brand and series (and in fact cheap consumer pieces):
Drive 1:

 ada3: <KINGSTON SA400S37120G SBFK71E0> ACS-4 ATA SATA 3.x device

Model Family:     Phison Driven SSDs

User Capacity:    120,034,123,776 bytes [120 GB]

Drive 2:

 ada0: <KINGSTON SA400S37240G S1Z40102> ACS-3 ATA SATA 3.x device

Model Family:     Phison Driven SSDs

User Capacity:    240,057,409,536 bytes [240 GB]

Drive 1 was bought three years ago, drive 2 this year.

Lets do the math:
Drive 1:

Code:

  9 Power_On_Hours          -O--C-   100   100   000    -    15685
12 Power_Cycle_Count       -O--C-   100   100   000    -    175
231 SSD_Life_Left           PO--C-   100   100   000    -    76
233 Flash_Writes_GiB        PO--C-   100   100   000    -    26217
241 Lifetime_Writes_GiB     -O--C-   100   100   000    -    22599
242 Lifetime_Reads_GiB      -O--C-   100   100   000    -    5905
244 Average_Erase_Count     ------   100   100   000    -    238
245 Max_Erase_Count         ------   100   100   000    -    267
246 Total_Erase_Count       ------   100   100   000    -    1394196
0x01  0x018  6     47394438211  ---  Logical Sectors Written
0x01  0x028  6     12385445892  ---  Logical Sectors Read

Line 231 is just the same as line 244, transformed per 100 - (X / 10). So the average-erase-count targets to 1000, and then the SSD-life-left will reach 0.
We can calculate the algorithm overhead: 22599 GB (line 241) / 120 GB = 188. (Or flash writes: 26217 GB (line 233) / 120 GB = 218.) The other value, Logical Sectors Written, is the same as line 241: 22599 * 1024 ^ 3 / 47394438211 = 512 (sector size).
The algorithm overhead then figures as (238/188)-1 = 27%. In other words, one can write the drive 791 times.

That is fine with me, and so I bought the other piece.
Drive 2:

Code:

  9 Power_On_Hours          -O--CK   100   100   000    -    2995
12 Power_Cycle_Count       -O--CK   100   100   000    -    134
231 SSD_Life_Left           ------   087   087   000    -    87
233 Flash_Writes_GiB        -O--CK   100   100   000    -    7883
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    7729
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    3349
244 Average_Erase_Count     ------   100   100   000    -    130
245 Max_Erase_Count         ------   100   100   000    -    174
246 Total_Erase_Count       ------   100   100   000    -    55955
0x01  0x018  6      3324330597  ---  Logical Sectors Written
0x01  0x028  6      2730222700  ---  Logical Sectors Read

But here, things are very different.
Line 231 is still the same as line 244, so the erase-count target still seems 1000.
But then the overhead is 7729 GB / 240 GB = 32 -> (130/32)-1 = 306% !
The Logical Sectors Written also do not line up with anything: 7729 * 1024 ^ 3 / 3324330597 = 2496 (sector size).
And the finally interesting number N gives: 248 times writeable!

This very much looks like somebody at the manufacturer's have decided that 1000 times writeable is far too much for a consumer drive, and changed the algorithm accodingly to spend the drive in 1/4 of the time (while the chips might be just all the same).
There is a name for such: we call these kind of 'improvements' planned obsolescence.

Addendum:
Doing some research and querying about the Kingston company lets the phenomenon get a bit clearer:

Kingston has a long-standing history offering unlimited lifetime warranty for their storage products. With flash memory products which are built to decay by design, such a corporate philosophy obviousely cannot be kept up. Current documents from Kingston now give the interesting number N as 333 for mentioned models, which is similar to other brands' equivalent products.

It is also worth remarking that the SMART data from these Kingston drives are quite intellegible (which is not so true for certain other brands), as otherwise one could not even observe such changes.

Over all, I certainly will continue to buy Kingston memory.

trev · Nov 13, 2019

I only buy Samsung consumer SSDs. I have a bunch of Samsung 830 256GB SSDs from 2011 (3 year warranty, no TBW limit) in Mac minis running macOS or FreeBSD on a daily basis and the SSDs are all still reliable. Last year I bought a few Samsung 860 EVO 1TB drives (5 year warranty, 1,200 TBW limit) - time will tell how these fare.

The only parameters I pay any attention to are (1) length of warranty and (2) maximum TB written. The first gives you an idea of how long the manufacturer expects the drive to last in the consumer world (and in my experience to date, is understated) and the second gives you an idea of the maximum number of writes before the warranty is voided (no idea if this is also understated, I don't track it).

The warranty lengths and TBW values for Samsung consumer drives can be found here: https://www.samsung.com/semiconductor/global.semi.static/SAMSUNG_SSD_Limited_Warranty_English_UK.pdf

ralphbsz · Nov 14, 2019

You are being unreasonable. It's not planned obsolescence. The manufacturers are not being evil. They are trying to sell a working product, manufacture it as cheaply as possible (as compatible with the requirements of their customers), while at the same time not exposing themselves to huge warranty costs and legal liability. They are trying to get through a minefield.

One of the things that makes all of this really hard for an individual consumer to understand (and you made the same mistake in another recent post, where you complained about 3M hour MTBF, because you are not intending to live for 300 years): All these things are statistical statements. For example, if a certain model is rated at a 1M hour MTBF, that absolutely does not mean that an individual drive will live for that long. It only means that if you have a large ensemble of drives (tens of thousands), and measure the time-dependent failure rates with good statistical accuracy (which requires a lot of drives), you will find a failure graph that is statistically compatible with a poisson process with a 1M hour parameter.

The same applies to the write cycle count. If a manufacturer says "1000 times", it doesn't mean that he guarantees that it can do 1000 writes for each block, and that if the drive fails at 999, you can get your money back under warranty. It even less means that if the drive has done 1000 writes, I will refuse to accept the 1001st write cycle. It means that statistically, a large ensemble of drives should average this many, under typical workloads (write cycles are VERY workload dependent, because of false sharing and write amplification). In your above example, the manufacturer decided that 1000 is the wrong number (from the viewpoint of maximizing utility of the drive to the user, minimizing cost, including warranty and legal liability), and adjusted it correctly.

Similar example: Hard disk manufacturers are now putting in specifications for the amount of data you are allowed to write (and this is spinning rust drives), a typical value is 550 TB/year. This doesn't mean that the drive will break if you write more, or that it will refuse write commands when it gets too many. It only means that the manufacturer will refuse to pay the warranty coverage if you have written more than that.

At the individual consumer level, it is play impossible to get at the underlying data. Drive manufacturers will NOT share it. That's because the drive business is insanely competitive, and profit margins are microscopic. Data about quality and durability is very important to have a competitive advantage. Now, if you are the type of customer who buys a million drives (not a joke, those customers exist), then the manufacturer will create an NDA, and all this type of data will be shared. The big customers know exactly what's going on.

By the way, about 10 or 12 years ago, one of the SSD manufacturers actually tried to enforce the write cycle count, by slowing down writes when the drive gets dangerously close. I was one of those large customers at the time, and we stopped that attempt pretty quickly.

PMc · Nov 14, 2019

ralphbsz said:
You are being unreasonable.

No. See below.

It's not planned obsolescence. The manufacturers are not being evil. They are trying to sell a working product, manufacture it as cheaply as possible (as compatible with the requirements of their customers), while at the same time not exposing themselves to huge warranty costs and legal liability. They are trying to get through a minefield.

Alright, one can put it that way. And that would not be a problem, if they would tell the outcome of these engineering decisions. Which they don't. What they are telling instead are insults. Insults against the brains of the reader.

One of the things that makes all of this really hard for an individual consumer to understand (and you made the same mistake in another recent post, where you complained about 3M hour MTBF

There is no difficulty in understanding. But any understanding does require some input data to be understood, and that is where the lack is.
BTW: I don't remember ever talking about MTBF, so that was probably somebody else.

All these things are statistical statements.

I don't care about the statistical statements. I care about the engineering decisions.

There is a statement quoted, from Sandforce, I don't have the original source so I don't know if it's true, but anyway: they would not do overprovisioning, because the overprovisioned cells could also fail. (This is one of such statements which I consider blatant insults against the brains of the reader. It is like: we don't do disk mirroring because the mirror could also fail.)
Instead they rely on a compression algorithm to shrink the user data and thereby aquire the surplus cells necessary for overprovisioning.

Now, question for You: do You see the consequences such a decision would have on GELI encrypted partitions?

This is why the engineering decisions are important to know.

A similar thing is with the harddisk branding at WesternDigital in kinda rainbow colours. I already know how a rainbow looks like, so that is not what I need to know. What I would need to know is the technical differences between the various models - but they don't tell these.

The same applies to the write cycle count. If a manufacturer says "1000 times", it doesn't mean that he guarantees that it can do 1000 writes for each block, and that if the drive fails at 999, you can get your money back under warranty. It even less means that if the drive has done 1000 writes, I will refuse to accept the 1001st write cycle.

We will learn about that one in some time. I am indeed curious what these SSDs will do when the SSD_Life_Left counter reaches zero. Will they only post the usual S.M.A.R.T. warning, or will they go into locked readonly mode?
This again might be an engineering decision not told to the customer.

It means that statistically, a large ensemble of drives should average this many, under typical workloads (write cycles are VERY workload dependent, because of false sharing and write amplification).

And consequentially, WesternDigigal asks the customer to just choose the proper rainbow colour for the workload they have - be it NAS, Surveillance, Gaming, Datacenter, etc.
And that might indeed be alright with the usual customer who can be assumed to know what their application does, but might have no idea how the hell a computer actually works.
But here things are different: it is me who assembles the computer, it is me who hacks the OS, it is me who writes the application - it is me who designs the workload! And it would also be me who could decide how a given drive engineering would behave with the workload. But I cannot say the same about a rainbow colour.

Technical specs are certainly not for the many, but they used to be available for those interested.

In your above example, the manufacturer decided that 1000 is the wrong number (from the viewpoint of maximizing utility of the drive to the user, minimizing cost, including warranty and legal liability), and adjusted it correctly.

That might indeed be. And from the datasheet linked by trev, the marketed figures result in the essential number N somewere between 150 and 1200, so this is all well within the usual range. In fact, the 250 is more appropriate for the market class of the device.

The only problem with the essential number N is, it doesn't look big. If you would tell the customer, this drive can be written 600 times, they would shrug and go away. That's why nobody does just tell this number, even while it would be the best figure for straightforward comparison, and can be easily figured for each device from a valid TBW value.

Similar example: Hard disk manufacturers are now putting in specifications for the amount of data you are allowed to write (and this is spinning rust drives), a typical value is 550 TB/year. This doesn't mean that the drive will break if you write more, or that it will refuse write commands when it gets too many. It only means that the manufacturer will refuse to pay the warranty coverage if you have written more than that.

The problem is, they have too much compute power within their devices, to do that kind of things.
And this is not limited to disk drives - all kinds of gadgets now start to record what the customer does with them.
Now get awake! The step from recording behaviour to restricting behaviour is a small one. We already have cars that refuse to start when the driver is drunk. In the end the customer will no longer be the owner of the gadget (owner being defined as the person who can do at will with the object what they want to do).
Instead, the customer will be mere cattle, unknowing livestock oblidged with the task of consuming all the stuff.

free-and-bsd · Nov 14, 2019

PMc said:
Some five years ago I bought my first small SSD, just out of curiosity and for testing. A few weeks later, without having done much, the piece was dead-in-the-water.

Replacement by warranty is one thing. But does the warranty include recovering data from the dead SSD free of charge? Cause that costs a lot more than the disk itself (had this experience with SSDs). Hence, my safe solution: use them read-only for such data as will be mostly read, not written. Like OS + apps. After all, their power is in READS, not in WRITES, isn't it? Of course, spinning drives can die instantly without preliminary warning as well, but I haven't had that experience yet.

Deleted member 59789 · Nov 14, 2019

I'm going throw this out there and I know some people frown upon what I have to say but you know, the only person in this thread who has actually provided data (and in detail) so far, is PMc, the original poster.

If people start throwing Red Herrings to distract from the discussion with a Hasty Generalization without actual data, that is misleading and is Missing the Point. PMc's claim of planned obsolescence is still valid based off his data. This however becomes an ethical discussion of right vs wrong when comes to the consumer vs manufacturer and comes down to capital loss/gain for both parties. Both sides have their place.

Also let's keep this discussion civil and not point fingers by attacking a person by saying they are unreasonable just because they have a different position.

CraigHB · Nov 14, 2019

N=~600 for a consumer drive. In other words a single cell is good for about 600 writes. Everything else you can figure from there. Also keep in mind the amount of space free for wear leveling. You can't wear level with cells occupied by data. I usually try to keep at least 30% free with 50% more desirable. That's a pretty big downside compared to a mechanical drive where there's zero issue in filling it to the brim. It further broadens the cost of storage difference which is still quite bit cheaper with mechanical. However the speed difference is so cavernous it's worth the hit.

I agree they're trying to make it confusing as possible. I think they're trying to keep that small number out of print. They use TBW because it looks like a big number which implies the drive will last a long time. They don't want people freaking out because you can only write a single cell a few hundred times. The guys in marketing think people will not get how wear leveling works.

As far as the difference between SSD and mechanical in system speed, it's huge for me in every regard, especially for NVMe drives. It's pretty nice to be able to write a 3GB file in one second instead of 30 seconds for a mechanical drive. In fact just recently I changed out a mechanical for SSD on an old PC and even with an antiquated 1.5Gb SATA connection it made a very noticeable difference.

ralphbsz · Nov 14, 2019

PMc said:
No. See below. ...

Writing a detailed response will take an hour or so. I'll try to get that done tonight. This is all a grey zone. And apologies for the confusion about who wrote about MTBF ... it was another poster.

ralphbsz · Nov 14, 2019

free-and-bsd said:
Replacement by warranty is one thing. But does the warranty include recovering data from the dead SSD free of charge?

No, drive warranty (spinning or flash) is always just for the cost of the drive, or replacement, not for the value of the data on it that has been lost, or the effort of recovering it. The assumption behind it as that any computer user who stores valuable data on disk has planned correctly for hardware failures, which are after all expected. Under this assumption, data loss or having to do data recovery from a drive are not necessary, because RAID/replication and backup take care of that.

For large computer users this works really well. I've been working in big storage for decades, and data loss events due to failure of individual disks are extremely rare, and data recovery from dead disks is hardly ever necessary. Data loss does exist, but it is nearly always caused by factors other than individual drives: site problems (fire, flood, hurricane), software problems, and most importantly human error.

Now, this is great, but it requires a lot of infrastructure: lots of devices to spread the data over, lots of complex software, data centers in multiple locations spread far enough that natural disasters are uncorrelated, and lots of humans to look over each other's shoulders. But for an individual home user, who has 1 disk and 10 fingers and 1 brain, this is not easy. That individual just has to be very conscientious about doing backups and setting up systems with redundancy, or storing their data in safer places, or accepting that data is at risk.

Hence, my safe solution: use them read-only for such data as will be mostly read, not written. Like OS + apps. After all, their power is in READS, not in WRITES, isn't it?

I think by using SSDs only as a read cache you are selling them short. They are also really good for fast writes; their latency for random small writes is excellent, and their throughput for bulk writes is also great. Will that wear them out? Yes, but depending on workload and provisioning that may not matter. There are applications in which SSDs (or flash in general) make superb write caches or write-only logs. This may not generalize, and it may not be optimal for all situations.

Of course, spinning drives can die instantly without preliminary warning as well, but I haven't had that experience yet.

Well, spinning drives die all the time. At home I must have lost half a dozen in the last ~35 years that I've had them, perhaps a dozen. Matter-of-fact, I would not claim that spinning drives have in general a better durability (in terms of bits lots per year by bit stored) than flash.

PMc · Nov 15, 2019

CraigHB said:
N=~600 for a consumer drive. In other words a single cell is good for about 600 writes. Everything else you can figure from there.

Alright. That figures as 0.2 picoDollar per Byte written. It's the same for enterprise drives.

So the solution is
1. cut all the crap
2. configure the equipment in a way that replacement doesn't hurt
3. get a drive that satisfies the speed reqirements
4. choose a quality that, for the given workload, is supposed to die within the warranty time (because otherwise, if the piece dies after warranty period but within the promised endurance, you dont get compensation).

The difficulty is with my attitude. I used to love my drives, consider them almost living beings (the first of them served for warming my feet in winter) - and in return they always told me beforehand when they were about to die. But these pieces shoud rather be seen as commodites, like the air filters: they work for a certain time, and then get replaced.

ralphbsz · Nov 15, 2019

PMc said:
The Internet nowadays is full with millions of texts about the life-span of SSDs,...

Nearly all of which is complete bullshit. The only thing that matters is information from the manufacturer. And as I've explained before, the manufacturer will not share information about internals with small customers. If anyone has received such information, they are under NDA, and they will *NOT* post it on the internet.

I'm only interested in one single number: how many writes?

And if someone gave you that number, you would not know what to do with it. You have to remember that there is a complex FTL between you and the flash cells. You have no idea how many hardware writes each of your logical writes turns into. For really large writes, that ratio is close to 1:1, but when I say "close" it could easily still be off by 10% or 20%. Which in the highly competitive SSD marketplace is a huge difference. For small random writes, write amplification is a HUGE factor. Flash manufacturers have invested an enormous amount in their FTLs (and some SSD customers build their own FTLs or modify the OEMs), and FTLs are a very active research topic: go to a storage conference, and you will be bored by a half dozen academic talks about FTL (yes, I've slept through these talks). The details of the FTL will not be shared.

I would like to know exactly why I might pay X times the price for an enterprise class piece, and exactly what I would get as added value.

Buy a million of them, and you will be able find out. I've sat in those meetings. No, I'm not saying with whom, and what they said. For small quantities, you have to trust the vendor. I know this is frustrating to you, but money talks.

... wear-levelling algorithm That is no magic, that's just a finite algorithm.

If you think that wear-leveling is a small and simply argument, you are completely wrong. It is very complex.

... that one single number N: how often can the amount of capacity be written to the drive?

It is not a single number. It is a distribution, which is correlated with a lot of factors. The details are secret.

I basically don't want manufacturers telling me lots of assumptions about my behaviour instead of telling me facts about their product's behaviour!

You are in an unusual position, which the industry doesn't do a good job supporting. You think you know your workload (and I'm sure you are a bit wrong, but at least you are closer than people who have zero clue). That knowledge is more typical of high-end enterprise customers, and those get to have close negotiations with the manufacturer. But you only buy one or two SSDs, so that negotiation process is impossible. Do you think a SSD vendor will send a lawyer to spend a few hours with your lawyer to hammer out an NDA (at $500/hour for the lawyers), then send a half-dozen FTL and device engineers to meet with your engineering team? For one decide that you pay $200 for? Hell no, financially that makes no sense. I've sat in those meetings, but my employer did buy many M$ worth of SSDs, from multiple competing vendors.

To give you an idea of the scale: I heard recently that the two big storage vendors sell over 90% of their products to a set of about 5-10 customers (the typical suspects: FAANG, the two Chinese internet giants, and the few biggest computer companies). Given that WD and Seagate each are about $10B companies, you have to understand that you are in a market whose products are tailored to customers who buy roughly $1B per year. Your annual disk purchases are roughly a factor of *A MILLION* too small for you to be relevant. No, not a factor of 2 or 10 or 100, but of a million.

So what do the manufacturers do?

A similar thing is with the harddisk branding at WesternDigital in kinda rainbow colours.

They are trying to help you! They are trying to serve those tiny customers that are barely relevant to the manufacturers, and who for the most part don't know what their workloads are. They give them a simple guide: Is your application more or less like this? Then use this model.

BTW: I don't remember ever talking about MTBF, so that was probably somebody else.

I'm sorry, another poster (whose user-id also starts with P) was talking about using published MTBF numbers to predict the lifespan of his drive. Sorry, statistics doesn't work that way.

There is a statement quoted, from Sandforce, I don't have the original source so I don't know if it's true, but anyway: they would not do overprovisioning, because the overprovisioned cells could also fail.

That statement is obviously completely nonsense, as you point out. I am sure no Sandforce person ever said this in public, other than perhaps right before falling off the bar stool in a pub. I've met with their engineers and sales people, and they are not stupid. This must be a misquote. Clearly, overprovisioning is necessary (at some level), and is good for reliability; the question is how to balance the cost of overprovisioning against the expected benefit, which varies with workload and customer needs.

Instead they rely on a compression algorithm to shrink the user data and thereby aquire the surplus cells necessary for overprovisioning.

I've never heard of an SSD sold that purely relies on compression and has no overprovisioning. If a SSD is sold as having X capacity, you will be able to write X completely random (uncompressible) bytes to it, at least a few times in a row. Now, obviously SSDs also try to use compression, to squeeze more out of the hardware.

Now, question for You: do You see the consequences such a decision would have on GELI encrypted partitions?

A well-written SSD will quickly recognize that your writes are not compressible, and do the right thing. And you will only get the capacity the hardware provides, which is only fair. Now, a badly written FTL might do really stupid things.

What it comes down to: Don't buy crap. Although I understand your predicament: It's hard for an individual purchaser to recognize what is crap and what isn't. The only (not really useful) suggestion I can give is this: Go by brand name. The big vendors try to deliver good quality, even if it costs a little more. I have two Intel SSDs in my server, I've bought an enterprise Samsung one (although I can't remember where it is installed), and my laptops/desktops get Crucial, because there the SSD doesn't matter much (I don't store much on the machine itself, it all goes on the network).

We will learn about that one in some time. I am indeed curious what these SSDs will do when the SSD_Life_Left counter reaches zero. Will they only post the usual S.M.A.R.T. warning, or will they go into locked readonly mode?

I suspect they will just issue a SMART warning. Going into lockdown would not be productive, it would just get customers mad, and mad customers are expensive customers.

Think of it this way: The profit margin on a $200 disk or SSD is probably a dollar or two. The moment you call the phone support line of the manufacturer, you have made them spend dozens of dollars on support costs, and a warranty replacement probably costs many hundreds. Keeping customers happy is incredibly important.

Now get awake! The step from recording behaviour to restricting behaviour is a small one. We already have cars that refuse to start when the driver is drunk. In the end the customer will no longer be the owner of the gadget (owner being defined as the person who can do at will with the object what they want to do).

Think about the economics of that, same argument as above. Restricting the behavior is not a winning move in a commodity low-margin business.

I told the store above already: One SSD vendor had been thinking to slow down writes so the user couldn't exceed the write limit that would lead to warranty replacements. We (very large customer) shut that one down in a heartbeat: It's OK if the drive fails. It's OK if the manufacturer refuses the warranty claim because we wrote too much. It is totally not OK for the drive to become slow.

Instead, the customer will be mere cattle, unknowing livestock oblidged with the task of consuming all the stuff.

Stop living in a paranoid fantasy world. People are not out to get you. Disk (and SSD) vendors are trying to make a buck or two, by selling you a device that's most likely to satisfy your needs. If you treat them as the enemy, you will probably not be happy, and it won't make their product any better.

free-and-bsd · Nov 15, 2019

ralphbsz said:
Well, spinning drives die all the time. At home I must have lost half a dozen in the last ~35 years that I've had them, perhaps a dozen. Matter-of-fact, I would not claim that spinning drives have in general a better durability (in terms of bits lots per year by bit stored) than flash.

They do, but not all of a sudden. It's kind of more predictable in most cases, as per my experience. While with the 3 SSDs that died in my experience, they all died in an instant. It was just about switching the computer off and not finding the drive the next boot. Never had that with HDDs, they mostly gave warning signs before complete failure.

drhowarddrfine · Nov 15, 2019

Just to throw this out there. I have never had a hard drive or SSD fail at work or home in the nearly 45 years I've dealt with computers.

Call me lucky, I guess. Last year, I wanted to pull some data off an old Galaxy that was sitting in my wet, cold basement for 10 years. I ran that for a couple of weeks, fulltime, without issue.

I'm only bringing this up to balance the strange--to me--statements of multiple drive failures. Not to disparage any such statements. I just have never seen it.

free-and-bsd · Nov 15, 2019

drhowarddrfine said:
Just to throw this out there. I have never had a hard drive or SSD fail at work or home in the nearly 45 years I've dealt with computers.

Call me lucky, I guess. Last year, I wanted to pull some data off an old Galaxy that was sitting in my wet, cold basement for 10 years. I ran that for a couple of weeks, fulltime, without issue.

I'm only bringing this up to balance the strange--to me--statements of multiple drive failures. Not to disparage any such statements. I just have never seen it.

Sure, there must be something wrong there

Phishfry · Nov 15, 2019

I agree with you drhowarddrfine that my SSD experience has been great.
My first SSD was from 2007 an OCZ Vertex 30GB and it lasted 9 years. This was one of the early cheap drives to come out.
When I tore it apart to see if anything was apparent I found a screw was loose and fell onto the circuit board.
So I believe the controller was fried, but no SMART issues beforehand.

I do believe in an enterprise setting it is possible to fill all the write cells.
Here is a user that setup SLOG on an inappropriate drive and has lost 4% of the drive quite quickly:

ZFS - How to determine if ZIL or L2ARC would be useful

I'm curious how I could go about figuring if adding a ZIL and/or L2ARC SSD would be useful for my RAIDZ2 array. Specifically I'm running an ARK server under a bhyve Linux VM. I suspect it's either disk IO or more likely CPU (old Xeon x5650's) are the bottleneck once I start exploring more, or...

forums.freebsd.org

My ZIL SLOG was added about 5 weeks ago <cut>it's already at 4% of expected life.

There are some great manufacturer drive tools available to tell the level of write cells.

Deleted member 30996 · Nov 15, 2019

I still have the humongous IBM 13.6GB HDD that came with the first computer I ever purchased in '98. I pulled it to use in my pfSense tower, ran like new till I retired the box and would probably spin up now if I hooked it back up.

I've never had an SSD but have had a couple HDD failures on my FreeBSD boxen over the years.

PMc · Nov 17, 2019

ralphbsz said:
Stop living in a paranoid fantasy world. People are not out to get you. Disk (and SSD) vendors are trying to make a buck or two, by selling you a device that's most likely to satisfy your needs. If you treat them as the enemy, you will probably not be happy, and it won't make their product any better.

No, ralphbsz, what we are talking about is the most rich and most powerful corps in the world. "A buck or two" is not a good joke in such a context. They are maybe not the enemy. The government is also usually not the enemy, but nevertheless it is essential and required to have a critical watch on their doings (at least in a free country): our culture is based on critical reflection.

Today, corps like Google or Amazon have a lot more power over people and their thinking than any government would ever have, and most people are already walking around like zombies controlled by their gadget. Is this paranoid fantasy? No, it is visible everywhere.
This picture has a much wider frame than just how to build an SSD. What we are actually talking about is a planetwide oligarchy, starting to make up their own rules about what people should know or should not know. Such has never happened before.

Back when I was in school, people already believed that the industry would do highly ambitioned and extremely complex things, things which common people would never understand. That was the time when the term "high-tech" was coined, and when the hippies started to believe strange things (like that every cash note would carry a monitoring chip to be located via satellite).
The most important thing I found out while still at school was that this is simply not true: with some proper studying, all the "magic" things which the industry does are well understandable, and are much less complex than common people might assume. And I learned, that, against all common assumptions, one just needs to ask, and will happily be provided with answers - happily, because most of those people involved in the technology would be glad that somebody shows actual interest in the details. That was the state of affairs in the last century.

This also is the main reason why I eagerly participated in building up the Internet: because I wanted an instrument where more knowledge-sharing can happen, where people can exchange what they know, and where people can re-aquire their understanding of the world that surrounds them (which is the essential prereq for responsible acting) - for the common improvement of mankind.

Today, things have changed entirely. The Internet has become an instrument for stupifying the masses, and the high-tech branches now actively pursue secrecy.
And You are indeed telling me that I should just accept that and be quiet?

Flash manufacturers have invested an enormous amount in their FTLs

I'm not impressed. Facebook has also invested an enormous amount into what is just a crappy webpage.

If you think that wear-leveling is a small and simply argument, you are completely wrong. It is very complex.

Ah? More complex than a Berkeley kernel, or an RMDBS engine? One can also sit in boring conferences about the gory details of these - the difference is, they are all in the open.

You are in an unusual position, which the industry doesn't do a good job supporting. You think you know your workload (and I'm sure you are a bit wrong, but at least you are closer than people who have zero clue). That knowledge is more typical of high-end enterprise customers,

It is also typical of the hackers. And there was a time when only the high-end enterprises and the hackers were interested in computers. (And, not to forget, science - but science has moved out of the game quite a while ago, changing their own role into production of industry slaves.)
It is not a matter of the industry supporting anybody, because that already implies that the industry is entirely in control of the things.

Do you think a SSD vendor will send a lawyer to spend a few hours

No, but I'm quite sure I would kick such lawyers back down the staircase. I'm concerned about the people, not the lawyers. And the simple and universal rule is: when the lawyers come into play, things have already gone seriously wrong.

To give you an idea of the scale: I heard recently that the two big storage vendors sell over 90% of their products to a set of about 5-10 customers (the typical suspects: FAANG, the two Chinese internet giants, and the few biggest computer companies).

Didn't know what FAANG means, but that's exactly what I said above: this is a planetwide oligarchy being in power. Knowledge is power, and they try to proprietarize knowledge. The SSD issue is just an indicator for that.

And then, none of those corps was there in the late 80s/90s, when we busied to build up the Internet! They appeared when it became a business opportunity, to make big money from the work of others! Stolen money, for the effort of making the Internet a shithole. Zero respect for them.

Given that WD and Seagate each are about $10B companies, you have to understand that you are in a market whose products are tailored to customers who buy roughly $1B per year. Your annual disk purchases are roughly a factor of *A MILLION* too small for you to be relevant. No, not a factor of 2 or 10 or 100, but of a million.

So what? Money rules the world? No, people rule the world.

We point fingers at Russia and say, things are controlled by oligarchs there? Where is the difference, when You tell me that only billionaires have the right to know what the technological advance has currently achieved?

So what do the manufacturers do?

They are trying to help you!

Oh. Wow!
Every time I read that they don't sell devices, but want to sell "solutions", I get sick. (That "solution" verb usually has the following translation: "We want to charge you with a multitude of what our crap is worth, and we want to do this because we think you are an idiot - we want to help you to get along nevertheless, but you have to pay us for that extra effort.")

The elementary definition of free-market economy is that buyer and seller share full knowledge.
In contrast, restricting knowledge as proprietary and then consequentially offering to "help" the buyer, that is by any means a totalitarian scheme. Such can only work with an oligarchy.

And that is what triggers me. I am not much concerned about paying a bit more or less, or about having a broken device on occasion. I am offended by the arrogance of that oligarchy, offering "help" to us people who actually created the things they now get rich from!

Why do You buy into that stuff? Are You on their payroll? (And even then there would be no need to not critizise it.)

I've never heard of an SSD sold that purely relies on compression and has no overprovisioning. If a SSD is sold as having X capacity, you will be able to write X completely random (uncompressible) bytes to it, at least a few times in a row. Now, obviously SSDs also try to use compression, to squeeze more out of the hardware.

Anyway, if they in any way pursue that idea of using compression (and I would be not the least surprized if they do), then this will have significant effects on GELI encryption. (And maybe that is why they prefer to sell SSDs with some kind of internal encryption scheme.) And that would be one of the topics which should actually belong into public discussion.

What it comes down to: Don't buy crap. Although I understand your predicament: It's hard for an individual purchaser to recognize what is crap and what isn't. The only (not really useful) suggestion I can give is this: Go by brand name. The big vendors try to deliver good quality, even if it costs a little more. I have two Intel SSDs in my server, I've bought an enterprise Samsung one (although I can't remember where it is installed), and my laptops/desktops get Crucial, because there the SSD doesn't matter much (I don't store much on the machine itself, it all goes on the network).

That becomes a matter of mere taste. I will definitely not buy any Samsung piece, because I've seen enough non-working designs from them; they appear to do their developent on the expenses of the customer (but other people seem happy with them). And I don't know much about the others who might be in that game. With conventional drives it was a matter of proper toolmakery art, and german enigneering was always good at that, so the IBM/HGST originally built in Mainz were probably the best.
Now this is a new game, and we have to find out where the expertise now is.

ralphbsz · Nov 17, 2019

The political and sociological statement I won't comment on, just a few little observations:

(Talking about FTL, the firmware inside an SSD):

PMc said:
Ah? More complex than a Berkeley kernel, or an RMDBS engine?

Significantly more complicated than a kernel, or the core of the database engine. A few years ago, I was told that the firmware inside a Seagate disk drive (spinning rust) is millions of lines of source.

And there was a time when only the high-end enterprises and the hackers were interested in computers.

Nonsense. Since the 1950, many (most) corporations, even medium-size ones have used computers. To do bookkeeping, accounts receivable, logistics, print invoices, calculate taxes. There is a reason that computers were democratized when the first "affordable" ones (less than millions of $ in today's money) came out. A lot of that goes to the IBM model 1401, the first mass-produced transistorized computer. Suddenly, a medium-size company could send computer-printed invoices to its customers, and use computers (with tapes, not even disks) to keep track of how much it owed its suppliers, and how much its customers owed it. This was a breakthrough. Suddenly, lots of small and medium companies had programmers, operators, computer maintenance people. And I'm talking about companies that are not parts of the "oligarchy". In the early 80s, I was the "director of data processing" (Vorstandsmitglied fuer Datenverarbeitung) in a company that was not gigantic, employed about 150 people, and we had our own computer (with 72kB of memory and ~20 terminals). That revolution is probably as important as the internet revolution of the 90s and 2000s.

Didn't know what FAANG means, ...

Facebook, Apple, Amazon, Netflix, Google.

Knowledge is power, and they try to proprietarize knowledge.

That has been true since the Sumerians.

And then, none of those corps was there in the late 80s/90s, when we busied to build up the Internet!

But others were. In the 50s and 60s, the computer market was way more monopolized than it is today. Matter-of-fact, it was referred to as "IBM and the seven dwarves". In the 80s and 90s (I was there), there were a handful of giant companies: IBM, HP, Sun, Oracle, Tandem, Compaq, Dell, Silicon Graphics, Convex, Sequent, ... I was just as monopolized as it is today. And of those companies, which ones are still alive and doing well?

If you think that companies like the FAANG (and their Chinese counterparts which we tend to ignore in the west) have unlimited power, you are wrong. They will likely be gone in 10 or 20 years, just like most people even remember who Silicon Graphics or Tandem or Sequent were. By the way, the headquarters of Silicon Graphics today are partly a Google building, and partly the Computer History Museum, which I highly recommend, to help people see things in perspective.

The elementary definition of free-market economy is that buyer and seller share full knowledge.

That definition becomes nonsense when the good being traded is information. And today, most products have a bulk of their value in IP. Actually, that has been true forever. There is a really nice article in the "Zeit" (german newspaper) about how the French in the 1700s stole the IP of making mirrors from the Venetians. There are the stories of how the Byzantines stole the IP of making silk from the Chinese. The history of the world is also the history of protecting knowledge and keeping secrets. You may not like it, but pretending otherwise is dumb.

With conventional drives it was a matter of proper toolmakery art, and german enigneering was always good at that, so the IBM/HGST originally built in Mainz were probably the best.

Disclaimer: I was an IBM employee for many years, but I am no longer one.
You are probably correct that on average the IBM disks were the best ones. And even today, I have a soft spot in my heart for disk drives made by their successor (in experience and intellectual property), namely HGST -> WD. However, I can not rationally prove that they are better than Seagate or Toshiba.

Most of the R&D for IBM's disk drives did not actually happen in Mainz. It happened in Silicon Valley, at the Cottle Road site, and associated research lab. There were specialized parts of it being done other places (Mainz and Boeblingen, a place in Japan that I can never remember how to spell so I won't try, and in northern New York at Endicott, Poughkeepsie and so on), but the center of disk activity was always in San Jose. This is also the place where the disk drive was invented, and where Mr. Shugart (the founder of Seagate) started his career at IBM. We still have a public park (with a soccer and rugby field) called "RAMAC Park", and places where one can visit some of the first disk drives ever built (I used to walk by disk drive serial #3 every morning). Today, the Cottle Road site has become a big residential / suburban development. The place where the first disk drives were built is today a Lowe's (a big home improvement / hardware store), but that building is still decorated with magnetic bit patterns on its outside walls, and in the parking lot (at the lumber side) is a little memorial to the original disk drive lab. I go shopping there regularly, and I make it a point to always stop at the little display.

free-and-bsd · Nov 18, 2019

Yes, the guys who restored data from the crashed SSD told me just that: buy branded (rather expensive) SSDs.

CraigHB · Nov 18, 2019

Agree, buy brands for SSD, for me it's Samsung, but I've also used Intel. I agree with the statement that SSDs are more like commodities. You use them until they get worn and then replace them. HDDs in in some cases can run for decades with no loss in function or concern of failure. So it depends on what you're after. If it's longevity go with an HDD, but take a hit in speed. If you want the great speed you get with an SSD you have to take the downsides with it.

Other How to make SSDs not better

PMc

trev

ralphbsz

PMc

free-and-bsd

Deleted member 59789

Guest

CraigHB

ralphbsz

ralphbsz

PMc

ralphbsz

free-and-bsd

drhowarddrfine

free-and-bsd

Phishfry

ZFS - How to determine if ZIL or L2ARC would be useful

Deleted member 30996

Guest

PMc

ralphbsz

free-and-bsd

CraigHB