Other SSD: now they die...

PMc · Jan 3, 2023

A while ago I mentioned here that when getting a usual SSD for a machine, I would prefer the HP S700 or the Verbatim Vi550.
Now I have to update this a bit. My HP S700 has died. It was still fully readable, but would not write anymore, and would report to smart an explicit FAILED status. (So at least this part is properly coded.)
Here is some data from the last regularly logged report:

Code:

Device Model:     HP SSD S700 250GB
Firmware Version: S0704A1
Local Time is:    Tue Nov  1 05:30:27 2022 CET
SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   050    -    0
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    14312
 12 Power_Cycle_Count       -O--CK   100   100   000    -    687
171 Unknown_Attribute       -O--CK   100   100   000    -    0
172 Unknown_Attribute       -O--CK   100   100   000    -    0
173 Unknown_Attribute       -O--CK   100   100   005    -    1068
174 Unknown_Attribute       -O--CK   100   100   000    -    346
176 Erase_Fail_Count_Chip   -O---K   100   100   000    -    100
183 Runtime_Bad_Block       -O--CK   100   100   000    -    1
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
194 Temperature_Celsius     -O---K   100   100   000    -    52
198 Offline_Uncorrectable   ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   000    -    2
241 Total_LBAs_Written      -O--CK   100   100   000    -    42189
242 Total_LBAs_Read         -O--CK   100   100   000    -    21972
243 Unknown_Attribute       -O--CK   100   100   000    -    46536

0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1              21  ---  Percentage Used Endurance Indicator

And now comes the funny part. There is a label on the piece "3 years limited warranty".
So I tried to contact HP for RMA, and the Webpage said the S/N is wrong.
There is no way at all to contact HP without a valid S/N, and if the webpage decides that the S/N isn't valid, then it isn't valid.

I engaged some other connections, until I got a phone number of a representative of HP, and escalated. What came out of it: HP insists in giving no warranty at all on these devices. They just put a label on them, reading "3 years limited warranty". And I shoud ask my dealer for RMA.

I wasn't told the background, but I know a possible one: when you have some component produced in China, say some 5000 pcs, you don't want warranty, because sending the broken pieces back to China may cost more than what they are worth. So you agree with the manufacturer on a failure rate, and you get that as a discount on the price, And then you handle the warranty with your customers in your own responsibility.
It is someway strange that HP would do such, and it is even more strange that they state an explicit warranty on the label - because that would then normally be at the discretion of the dealer.

Okay, I sent the crap to the dealer three weeks ago (reichelt.de), and I haven't heard from them anymore.

So, be warned. Warranty is no longer warranty, and known brands are no longer something to rely on.

But then, that piece at least made it for about a year, under duty. Or 21% of the expected lifetime, or 41 TBW (~180 cycles).
I replaced with a reserve, and that one died just now - and is no longer detected. Rebuilding ports 2023Q1 plus the daily periodic was apparently too much.

Code:

Device Model:     KINGSTON SA400S37240G
Firmware Version: S1Z40102
Local Time is:    Mon Jan  2 00:17:06 2023 CET

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   000    -    100
  9 Power_On_Hours          -O--CK   100   100   000    -    19054
 12 Power_Cycle_Count       -O--CK   100   100   000    -    893
148 Unknown_Attribute       ------   100   100   000    -    0
149 Unknown_Attribute       ------   100   100   000    -    0
167 Write_Protect_Mode      ------   100   100   000    -    0
168 SATA_Phy_Error_Count    -O--C-   100   100   000    -    1
169 Bad_Block_Rate          ------   100   100   000    -    0
170 Bad_Blk_Ct_Erl/Lat      ------   100   100   010    -    0/0
172 Erase_Fail_Count        -O--CK   100   100   000    -    0
173 MaxAvgErase_Ct          ------   100   100   000    -    0
181 Program_Fail_Count      -O--CK   100   100   000    -    0
182 Erase_Fail_Count        ------   100   100   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
192 Unsafe_Shutdown_Count   -O--C-   100   100   000    -    372
194 Temperature_Celsius     -O---K   043   064   000    -    43 (Min/Max 23/64)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
199 SATA_CRC_Error_Count    -O--CK   100   100   000    -    0
218 CRC_Error_Count         -O--CK   100   100   000    -    1
231 SSD_Life_Left           ------   071   071   000    -    71
233 Flash_Writes_GiB        -O--CK   100   100   000    -    42094
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    40799
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    28977
244 Average_Erase_Count     ------   100   100   000    -    293
245 Max_Erase_Count         ------   100   100   000    -    339
246 Total_Erase_Count       ------   100   100   000    -    125826

Obviousely, the standby bootdisk now also had a bad sector and failed to fsck. That's a 12 year old WD-blue, and it does that occasionally. With these pieces one knows what to do: read with dd up to the error, write zero bytes onto the next sector, and fsck again. This time it had hit an inode table, so all american timezones were gone plus some tests plus the entire /usr/sbin. That's not a problem, because zpool lives in /sbin. Mount the pools, go /usr/src/usr.sbin; make install; reboot. Needs a reinstall on occasion, but I don't need american timezones neither tests.

Lesson learned: one should have more than one spare SSD on shelf, and one should run the new ones in the server so they die within country-law warranty, and then put those that survive (if any) in the desktop.

In any case, orderly practises of dealership are seriousely deteriorating - but that's not so very new: I once ordered some furniture parts from an online shop, What arrived was basically construction waste: cut pieces of cable and broken glass. I complained, sent it back, and what they sent in return was again broken glass and construction waste. I sent it back again and never got anything more from them neither my money back. Understandable, as they were more busy with their stock market listing...
As an online shop you can do that, it's a practical business model.

jmos · Jan 3, 2023

PMc said:
So, be warned. Warranty is no longer warranty, and known brands are no longer something to rely on.

My experience: The warranty is just an indicator of the trust a manufacturer has in its product, but when it comes to data storage you never can use it: Simply having some mails on the disk prevents you from sending it back - I'm simply not allowed to give such data to third parties. A video copy of a DVD, music, private photos - who wants to send such a drive back when you can't destroy the data first?

I've once got a WD Red 4TB, which failed after 3 hours - customers data just copied on it. At least in germany I'm now not allowed to give that drive away anymore… Warranty? Unusable.

Now I'm always buying drives with 5 years warranty, and no later than the warranty comes to an end I'm moving such a drive to the backup system ("second backup line" - first backup has to be save) - nevertheless the health of that drive. (Unfortunately that won't be possible anymore with my NVMe - the next drive that has to be replaced…)

(BTW: reichelt.de is IMO trustworthy - things may fail everywhere, and around christmas…)

PMc · Jan 3, 2023

jmos said:
My experience: The warranty is just an indicator of the trust a manufacturer has in its product, but when it comes to data storage you never can use it: Simply having some mails on the disk prevents you from sending it back - I'm simply not allowed to give such data to third parties. A video copy of a DVD, music, private photos - who wants to send such a drive back when you can't destroy the data first?

I couldn't care less; that's not my responsibility. If somebody constructs a device that holds my stuff, and they need it back to repair while I cannot remove my stuff, then it is their responsibility to protect it. As simple as that.
It's the same as with the ticket vending machine where the software fails during the vending process and doesn't unlock my credit card. Should I stay there during the night in the rain and wait until it does it's daily reboot? Certainly not - but I bet there are people who would.

It's that people construct stuff that breaks. And then they want to get rid of all responsibility (because they don't love their goods, they only love money), and they want you to worry about their breakage, their flaws - to take responsibility for their misengineering. And that only because they do you the favour to spoil you with their greed. How crazy is that?
But that's the new society - people worrying about failures where they have no influence on, for breakage in tech stuff that they not even have engineered. And there is a simple reason for that: it pays off. Worried people are docile, they do what they get told, and they pay.

And, btw, if we are seriously paranoid, then data on disks can be encrypted.

jmos said:
I've once got a WD Red 4TB, which failed after 3 hours - customers data just copied on it. At least in germany I'm now not allowed to give that drive away anymore… Warranty? Unusable.

I doubt that - because then whoever does not allow to enact the warranty that is granted by law, would have to pay the drive.

jmos said:
Now I'm always buying drives with 5 years warranty, and no later than the warranty comes to an end I'm moving such a drive to the backup system ("second backup line" - first backup has to be save)

I don't get that. If the warranty is just a property of the piece and not supposed to be enacted, then it's only purpose is to justifiy a higher price. So somebody produces a device, and then offers it in two variants: with a 3-year-warranty sticker attached for 80$, or with a 5-year-warranty sticker for 120$. And that warranty is never supposed to be enacted, neither from the customer (for reasons I do not fully understand) nor from the manifacturer (because if you try, they simply say that they don't provide warranty for the device).

In marketing we had this already. There was work done in business sciences about how to target the same good to different "markets", i.e. to people who differ in their will to spend money. This is done by means of different placement, different packaging, different advertisement, etc. (And yes, this is actually called a "science".)

jmos said:
(BTW: reichelt.de is IMO trustworthy - things may fail everywhere, and around christmas…)

I also think they are. But it is not their duty to offer warranty beyond what is required by law only because a manufacturer decides to put such stickers on their devices.

cracauer@ · Jan 3, 2023

I rarely RMA any computer parts anymore. They successfully gave me so much trouble that it isn't worth it anymore.

I even re-started buying ASUS mainboards, which I avoided for many years because of bad RMA experiences. But now that I don't bother with RMA with other brands either I can buy Asus again. Yeah!

bob2112 · Jan 3, 2023

jmos said:
A video copy of a DVD, music, private photos - who wants to send such a drive back when you can't destroy the data first?

This is one of reasons for disk encryption.

ralphbsz · Jan 3, 2023

You are being too cynical.

PMc said:
It's that people construct stuff that breaks. And then they want to get rid of all responsibility (because they don't love their goods, they only love money), and they want you to worry about their breakage, their flaws - to take responsibility for their misengineering.

I know lots of people who work in the disk drive business. They are usually smart, hard-working, and ethical. They are trying to create a compromise: A device that stores the maximum amount of data, with the best IO performance, extreme reliability, and at the minimum cost. The understand that different customers care more or less about different aspects of that compromise. In highly cost-conscious markets (amateur consumers who buy disks via convoluted intermediate distribution chains and dealers), they need to emphasize minimum cost, with capacity (bragging rights) being secondary. For enterprise customers (think EMC, Hitachi, IBM), they find a compromise between performance and reliability, with cost and capacity being minor concerns (once a disk drive is deployed to a big customer like a bank or insurance company, it will cost tens of thousands of $/Euro anyway). For the FAANG (where 90% of all disks get sold to), the compromise is typically customer- and application specific.

If you buy a cheap disk through a dealer, you get what you deserve. As an individual amateur, it is difficult to buy disks that are engineered and supported for best performance or best reliability today.

So somebody produces a device, and then offers it in two variants: with a 3-year-warranty sticker attached for 80$, or with a 5-year-warranty sticker for 120$. And that warranty is never supposed to be enacted, neither from the customer (for reasons I do not fully understand) nor from the manufacturer (because if you try, they simply say that they don't provide warranty for the device).

If you buy a million disk drives (for roughly a quarter billion dollars), and you find that a surprisingly large fraction of them fail earlier than expected and negotiated, then your engineering, purchasing and legal people get together with Seagate/WD/... sales, quality and legal people, and the contractual warranty is guaranteed to be honored. You can be sure that companies such as Amazon or IBM are capable of making Seagate, Samsung or WD behave, and vice versa. And yes, I've seen these negotiations.

Your problem here is different. You bought a $200 disk drive, which had to move through a chain of dealers before it reached you. You have a contract with one particular dealer that you bought the disk from. The dealers and distributed are multi-million $ companies, and the manufacturer a multi-billion one. You are just a guy with a used $200 disk, currently worth about $50 on the used market. Sure, you could spend a few hundred thousand on hiring lawyers and suing Seagate or Samsung, but that's just not practical. Getting angry about a real-world situation of unequal power and influence is a waste of perfectly good adrenalin.

Tieks · Jan 3, 2023

ralphbsz said:
Getting angry about a real-world situation of unequal power and influence is a waste of perfectly good adrenalin.

Consumer protection in the EU differs from that in other countries, afaik. Consumers do have legal rights regarding warranty. Which doesn't mean it always works out well.

Argentum · Jan 3, 2023

Tieks said:
Consumer protection in the EU differs from that in other countries, afaik. Consumers do have legal rights regarding warranty. Which doesn't mean it always works out well.

Yes, is works pretty well here, for private consumers. Does not apply to businesses.

As a private consumer I am entitled to a 2-year warranty for all electronics and no retailer really wants a dispute with a Consumer Protection and Technical Regulatory Authority. Better option for them is always to refund or replace...

PMc · Jan 3, 2023

ralphbsz said:
You are being too cynical.

I don't think so. I'm just looking at a larger picture: e.g. we won't fly to the moon again, because we have forgotten how to do it.

Or, as Ursula leGuin had put it:
We are ruled utterly by fear. There was a time we sailed in ships between the stars, and now we dare not go a hundred miles from home. We keep a little knowledge, and do nothing with it. But once we used that knowledge to weave the pattern of life like a tapestry across night and chaos. We enlarged the chances of life. We did man's work.

ralphbsz said:
If you buy a million disk drives (for roughly a quarter billion dollars), and you find that a surprisingly large fraction of them fail earlier than expected and negotiated, then your engineering, purchasing and legal people get together with Seagate/WD/... sales, quality and legal people, and the contractual warranty is guaranteed to be honored. You can be sure that companies such as Amazon or IBM are capable of making Seagate, Samsung or WD behave, and vice versa. And yes, I've seen these negotiations.

Yes, and that is business. It has nothing to do with engineering. It does not move mankind further.

OTOH, what was once done on the Berkeley OS, did change life.

ralphbsz said:
Your problem here is different. You bought a $200 disk drive, which had to move through a chain of dealers before it reached you. You have a contract with one particular dealer that you bought the disk from.

I'm not interestet in contracts. The game of us old hippies was always, to take what nature provides and then make it somehow work. We dont want to buy things, and we dont want to throw away things. We dont want to do "negotiations", and we want to do the engineering on our own.
That's the spririt out of which Berkeley grew as well, and that's the spirit that does actually change life in the long run.

ralphbsz said:
The dealers and distributed are multi-million $ companies, and the manufacturer a multi-billion one.

That's exactly the problem with modern technology. As the production effort (but not necessarily the engineering effort) gets extremely expensive, the market changes to an oligarchy.

This also makes the administrative layer very think, and decision-makers completely de-coupled from engineers.
Now, as R.A.Wilson has put it: "commnunication is only possible among equals", it is no surprise that nothing works. This can even be proven in systems theory. (So I am not cynical, I am just scientific. *veg* )

ralphbsz said:
You are just a guy with a used $200 disk, currently worth about $50 on the used market. Sure, you could spend a few hundred thousand on hiring lawyers and suing Seagate or Samsung, but that's just not practical. Getting angry about a real-world situation of unequal power and influence is a waste of perfectly good adrenalin.

I am usually not. Usually I buy broken Ultrastar for 10$ on ebay and make them work again. (At the time when WD grabbed them from HGST, the Ultrastar was still a classical disk, not substantially different from what we built in Mainz in the 90s.)

The problem with SSD is just that I do not yet know how to repair them.

PMc · Jan 3, 2023

Tieks said:
Consumer protection in the EU differs from that in other countries, afaik. Consumers do have legal rights regarding warranty. Which doesn't mean it always works out well.

That's the point. Legal warranty is 2 years here. The device was/is already older.
So I decided to figure out what that 3-years-limited-warranty sticker is good for. And if it is not good for anything, then that is also a possible result. But then people should know that.

free-and-bsd · Jan 3, 2023

Ah, they start coming now then )))))
You know I had that question RIGHT FROM THE START when SSDs were starting to become a preferred HD solution (some years ago).
They will guarantee the guarantee, of course. And flash their names like Samsung, Intel etc. And how RELIABLE the technology really is etc. I've read lot of that stuff online: "I've been using SSD in my laptop for X years now and am quite satisfied!!!" etc.

But in the end I still couldn't get rid of this simple comparison. Rotating HDD will die, but in most cases it won't die INSTANTLY. And an SSD will. And that's all the difference I care to know. Cause the question here, when applied to production cases, is NOT a cpl hundred bucks for the disk. But it is about the data that can or can NOT be retrieved in case this disk, HDD or SSD, fails. That's my math.

And since nobody can guarantee 100% in case of SSDs, it still remains the same. HDDs have more chances not to die instantly that do SSDs. Isn't that true? Or am I missing some complicated piece of statistical considerations that kind of equal them more or less?

EDIT: sure, I'm simplifying it a bit. As we always do when we want some general picture.
EDIT 2: sorry if maybe a bit off topic ))

richardtoohey2 · Jan 3, 2023

free-and-bsd said:
Cause the question here, when applied to production cases, is NOT a cpl hundred bucks for the disk. But it is about the data that can or can NOT be retrieved in case this disk, HDD or SSD, fails. That's my math.

In production you'd use RAID levels that allowed for one or two disk failures. And have disk monitoring. And good backups, replication, etc.

gpw928 · Jan 3, 2023

I plan for any of my "disks" to die instantly, because it happens. Sure, sometimes that will be inconvenient, but should never be unrecoverable (unless you planned for that). Managing the time to recover is where you have to spend money and effort if your window is small.

SSDs offer amazing performance gains over spinning rust. If they are less reliable than spinning disks in the long term, the observed difference is not so great as to require any changes to my redundancy, backup, or recovery plans.

PMc · Jan 4, 2023

free-and-bsd said:
Ah, they start coming now then )))))
You know I had that question RIGHT FROM THE START when SSDs were starting to become a preferred HD solution (some years ago).
They will guarantee the guarantee, of course. And flash their names like Samsung, Intel etc. And how RELIABLE the technology really is etc. I've read lot of that stuff online: "I've been using SSD in my laptop for X years now and am quite satisfied!!!" etc.

In a laptop, probably. There is effectively no load on them. In my desktop it's like this (and that was a day when I was heavily working on some application deployment stuff, moving in and out applications):

Code:

Local Time is:    Sun Jan  1 00:17:00 2023 CET
233 Flash_Writes_GiB        -O--CK   100   100   000    -    2568
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    5528
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    8043
244 Average_Erase_Count     ------   100   100   000    -    133
245 Max_Erase_Count         ------   100   100   000    -    158
246 Total_Erase_Count       ------   100   100   000    -    122578

Local Time is:    Mon Jan  2 00:17:00 2023 CET
233 Flash_Writes_GiB        -O--CK   100   100   000    -    2572
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    5541
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    8080
244 Average_Erase_Count     ------   100   100   000    -    134
245 Max_Erase_Count         ------   100   100   000    -    158
246 Total_Erase_Count       ------   100   100   000    -    122657

Scaled up, that is less than 5 TBW per year.

free-and-bsd said:
But in the end I still couldn't get rid of this simple comparison. Rotating HDD will die, but in most cases it won't die INSTANTLY.

Not necessarily. They might die instantly, or they may never die:

Code:

Device Model:     Hitachi HDS5C1010CLA382

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   099   099   016    -    2
  2 Throughput_Performance  P-S---   135   135   054    -    117
  3 Spin_Up_Time            POS---   136   136   024    -    295 (Average 217)
  4 Start_Stop_Count        -O--C-   096   096   000    -    18871
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    52
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   133   133   020    -    37
  9 Power_On_Hours          -O--C-   087   087   000    -    97297
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    1298
192 Power-Off_Retract_Count -O--CK   077   077   000    -    28133
193 Load_Cycle_Count        -O--C-   077   077   000    -    28134
194 Temperature_Celsius     -O----   166   166   000    -    36 (Min/Max 18/68)
196 Reallocated_Event_Count -O--CK   100   100   000    -    58
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0

Do the math on the Power_On_Hours: 97297 * 100 / (100-87) / 365 / 24 = 85

85 years of continuous operation - that is not the MTBF, that is what the engineers have put in there, that's what they expect the piece to do.

And it looks like that works out. Same design:

Code:

Product:              IC35L018UWDY10-0
Manufactured in week 42 of year 2002
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  23313
Elements in grown defect list: 8

Or even older - I don't know the mfd anymore, and the disk doesn't tell it:

Code:

Product:              DCAS-34330
Elements in grown defect list: 0

I keep these installed in the system just for the fancy. But probably they will outlive me.

There is by no means any guarantee that these disks will continue to work. It is about engineering, about what can be expected on average under usual conditions.

free-and-bsd said:
And an SSD will. And that's all the difference I care to know.

Yes, and that is a strange fact, and it doesn't really figure. Normally electronic circuitry does not just die, or it rarely does. It sure can, if it gets an overvoltage or such, but that is unlikely. And such flash cells also do not just die, and definitely not all at once.

And then there are two different informations. There are computer magazines who did run tests on how much the cells can actually endure, and it came out that they usually work longer, possibly much longer than rated by the manufacturer. And otoh there are people, a lot of them, who tell that the device just became unresponsive, and often rather early, long before it is used up.

There must be a technical reason for that behaviour. And I would like to understand it.

free-and-bsd said:
Cause the question here, when applied to production cases, is NOT a cpl hundred bucks for the disk. But it is about the data that can or can NOT be retrieved in case this disk, HDD or SSD, fails. That's my math.

Production cases are yet another matter. I am rather looking at the art of engineering.

In production we just make some contingency plan. 1. what is the worst thing that could practically happen, and how do we then get out of it? And 2. what is the most economic way to achive that?

With a mechanical disk there is a very high probability to get the data back: the thing can be disassembled, the platters mounted into an equivalent model, and then most of the data will be readable again. But relying on that is not an economic approach - it can be done, but it needs some delicate equipment.

So from the business perspective there is not so much difference: you need redundancy&backup in either case. But as a hobbyist/enthusiast with crafted equipment I would like to know how my devices feel, and talk to them.

With a mechanical disk I can hear that, I can see it in the response times, etc. With SSD I get to know nothing about what is going on.

And btw, Smart is only fabricated data. It is not what a disk actually thinks, with mechanical disks also. The disk runs an internal logic, and that logic certainly has states it knows about. But these states are not reported through Smart. Smart is only getting some data that the manufacturer wants the customers to see (and much of that data is still unintellegible). Maybe Smart is just implemented because it has to be - because there is a standard for it and people expect it.

free-and-bsd said:
And since nobody can guarantee 100% in case of SSDs, it still remains the same. HDDs have more chances not to die instantly that do SSDs. Isn't that true? Or am I missing some complicated piece of statistical considerations that kind of equal them more or less?

It's all right, but it doesn't change the game much. And then there is the other aspect, that SSD entirely eliminate the seek times - and that makes a whopping difference in many workloads, even with a cheap non-DRAM device that is not faster than a mechanical disk in throughput.

Argentum · Jan 4, 2023

PMc said:

In a laptop, probably. There is effectively no load on them. In my desktop it's like this (and that was a day when I was heavily working on some application deployment stuff, moving in and out applications):

Code:

Local Time is:    Sun Jan  1 00:17:00 2023 CET
233 Flash_Writes_GiB        -O--CK   100   100   000    -    2568
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    5528
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    8043
244 Average_Erase_Count     ------   100   100   000    -    133
245 Max_Erase_Count         ------   100   100   000    -    158
246 Total_Erase_Count       ------   100   100   000    -    122578

Local Time is:    Mon Jan  2 00:17:00 2023 CET
233 Flash_Writes_GiB        -O--CK   100   100   000    -    2572
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    5541
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    8080
244 Average_Erase_Count     ------   100   100   000    -    134
245 Max_Erase_Count         ------   100   100   000    -    158
246 Total_Erase_Count       ------   100   100   000    -    122657

Scaled up, that is less than 5 TBW per year.

I have an old 60GB Kingston SSD drive which I am using as a zpool cache (56GB). So far it works fine. SMART shows me:

Code:

241 Lifetime_Writes_GiB     0x0000   100   100   050    Old_age   Offline      -       79459
242 Lifetime_Reads_GiB      0x0000   100   100   050    Old_age   Offline      -       89021

That is ~79TB of data written! That makes ~1300 times the drive capacity. Will see how long it lasts and how it dies. So far no errors or any signs of wear off.

Alain De Vos · Jan 4, 2023

I think a zpool-log device has more writes than a zpool-cache device. No ?

Argentum · Jan 4, 2023

Alain De Vos said:
I think a zpool-log device has more writes than a zpool-cache device. No ?

Yes, agree, but my intention was not deliberately kill the drive but just use this old piece in a useful way. 60GB of cache seemed good enough for a desktop system.

Also, when log device dies, the pool is most probably also gone. Cache device failure does not affect the pool integrity.

EDIT: Seems that actually cache devices get much more writes, at least in desktop environments. Almost everything which has been read from the pool is written in cache device. Log devices store temporarily only synchronous writes.

Alain De Vos · Jan 4, 2023

I removed a log-device & cache-device from a pool. & The pool was intact.

Argentum · Jan 4, 2023

Alain De Vos said:
I removed a log-device & cache-device from a pool. & The pool was intact.

I have an experience when log device was physically disconnected and the pool was completely corrupt after that. Cache device is safe to disconnect.

Tieks · Jan 4, 2023

free-and-bsd said:
But it is about the data that can or can NOT be retrieved in case this disk, HDD or SSD, fails. That's my math.

My experience is that HDD's become slow and start making noise before dying. Thus you would know it was time for a mad dash to buy a new one, while still have the chance to retrieve your data. SDD's don't give such an early warning.
I solved that by buying two identical SDD's, both cheap. One for everyday use as a rootdisk. The other won't be used but gets a fresh copy of the rootdisk after each update of FBSD or ports.

PMc · Jan 4, 2023

Argentum said:
I have an experience when log device was physically disconnected and the pool was completely corrupt after that. Cache device is safe to disconnect.

I remember some discussions (and probably fixes).

I removed one, and it did work. When I started using them, I tested that, because I want it to be a possible activity.

Argentum · Jan 4, 2023

Argentum said:
EDIT: Seems that actually cache devices get much more writes, at least in desktop environments. Almost everything which has been read from the pool is written in cache device. Log devices store temporarily only synchronous writes.

And the cache seems to be well populated under normal use:

Code:

root@Silicium ~# zpool iostat -v ada0p3
              capacity     operations     bandwidth
vdev        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----

ada0p3      53.6G  2.08G      1      1  42.2K   109K
----------  -----  -----  -----  -----  -----  -----

PMc · Jan 5, 2023

Argentum said:
EDIT: Seems that actually cache devices get much more writes, at least in desktop environments. Almost everything which has been read from the pool is written in cache device. Log devices store temporarily only synchronous writes.

On desktops maybe. But NFS-servers, NFS normally needs to work with synchonous write, so everything a client writes goes through the log.

For the cache, you may configure the behaviour, there is vfs.zfs.l2arc.mfuonly, to write only data into cache that is repeatedly used. (Sadly, this is a system-wide switch; I would need it per-pool).

PMc · Jan 6, 2023

PMc said:
The problem with SSD is just that I do not yet know how to repair them.

Now I do know: The Russians can repair them. (That's the problem with the Russians: they don't believe in our lies.)

For one of the concerned devices, the dealer has somehow lost track over the holidays, and is now moving to send a replacement. (It wouldn't be necessary, it would do to just re-prime the firmware,) So reichelt.de is still trustworthy.

For the other, the Russians can apparently repair it. But that would need a Windows installation (I don't have, I've thrown away msdos in 1990) in order to create a bootable linux stick that will then reprogram the device. And one would need to recreate that "flash translation layer", but that doesn't seem too difficult.

richardtoohey2 · Jan 6, 2023

I think there’s quite a lot of information on the internet about ways you can try and retrieve the data but it can be time-consuming/expensive/impractical.