Solved ZFS with only one ssd

Matheus Furlanetto · Jun 29, 2017

Hi,

It's my first time trying FreeBSD and i want to install the system just for desktop use. My question is: its safe and worth to make a installation using ZFS in a computer where there is only one ssd (a samsung 850 evo 250gb), considering the write limits of a ssd or should i use UFS?

Thank you

P.S. Sorry for my english i'm brazilian

gnoma · Jun 29, 2017

Hello,

Since FreeBSD 9.1 (I think) ZFS supports TRIM command and it should handle SSD just fine.

Installing on a single disk is never safe but if you don't have a choice then ZFS is for sure the better option for data integrity.

Using ZFS may also help you protect yourself against human error - sysutils/zfstools will provide snapshots on every 15 minutes(by default) which may help you recover data deleted by mistake.

But still my answer is no - it's not safe to install on single HDD. Not because it's UFS or ZFS but because single HDD failure means losing all your data. ZFS is a little bit less risky but still it's risky.

Matheus Furlanetto · Jun 29, 2017

Thank you!
One more question: its possible to use periodic TRIM in ZFS? Since at least in linux, using ext4, a lot users have reported issues when using the "discard" option is fstab.

Those are some links talking about the problem:
https://wiki.archlinux.org/index.php/Solid_State_Drives#Continuous_TRIM
https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/

rigoletto@ · Jun 29, 2017

Not answering your question, but ZFS does not use fstab (by default), it take care of everything. The entrances in fstab are usually only for swap, tmpfs etc. when using ZFS.

EDIT: you may also want to take a look on the Handbook and here.

Eric A. Borisch · Jun 30, 2017

Matheus Furlanetto said:
Thank you!
One more question: its possible to use periodic TRIM in ZFS? Since at least in linux, using ext4, a lot users have reported issues when using the "discard" option is fstab.

Those are some links talking about the problem:
https://wiki.archlinux.org/index.php/Solid_State_Drives#Continuous_TRIM
https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/

"The current FreeBSD implementation builds a map of regions that were freed. On every write the code consults the map and removes ranges that were freed before, but are now overwritten.

Freed blocks are not TRIMed immediately, there is a low priority thread that TRIMs ranges when the time comes."
http://open-zfs.org/wiki/Features#TRIM_Support

Eric A. Borisch · Jun 30, 2017

One thing you can do (which may make it fail faster but in a way where you have a chance to replace it before "the end" without data loss) is to set copies=2. This is basically one-disk mirroring and trades off lifetime (total writes) for resiliency (you can recover from some bad blocks.)

Matheus Furlanetto · Jun 30, 2017

Thank you for your help.

ekingston · Jun 30, 2017

Eric A. Borisch said:
One thing you can do (which may make it fail faster but in a way where you have a chance to replace it before "the end" without data loss) is to set copies=2. This is basically one-disk mirroring and trades off lifetime (total writes) for resiliency (you can recover from some bad blocks.)

I can't say as I agree with you there. On a HDD, that might be a valid point but in my experience it is not the case on an SSD.

SSDs have quite a bit of brains internally. When they are getting close to end-of-life, they will move data that hasn't changed to the more worn out flash (even going so far as to swap existing data blocks). They do this in the background when not otherwise busy. You won't even know it's happening.

When an SSD dies, it just stops working. You do get warnings, but not in the form of odd drive failures you can recover from. The warnings are in the logs. Read the logs and understand what the messages mean. You will know that an SSD is running out of write lifetime before it happens.

It's also a good idea to add sysutils/smartmontools and include appropriate output in daily or weekly reports. Watch for when an SSD is approaching end-of-life so you can plan a replacement ahead of time.

Eric A. Borisch · Jun 30, 2017

ekingston said:
SSDs have quite a bit of brains internally. When they are getting close to end-of-life, they will move data that hasn't changed to the more worn out flash (even going so far as to swap existing data blocks). They do this in the background when not otherwise busy. You won't even know it's happening.

I understand what you're saying, and agree that setting up some monitoring via smartmontools is always worthwhile. I do that and then feed it into zabbix so I can also watch for and set up alarms on trends.

Has anyone seen any analysis of modern SSDs similar to Backblaze's HDD failure (and SMART indicator) analysis. It would be interesting to see how well these indicators predict failure, and how rapid failure is when it finally sets in (bad blocks returned? or the device just bricks?) If someone has a link, please share.

I was offering up an option that had the potential to provide more warning before complete (pool lost) failure. If drives go directly from "this is fine / no bad blocks returned" to bricked, it won't help at all, clearly. Hopefully the internal wear-out indicators and counters are rising, and keeping an eye on your counters will be important with or without adjusting copies.

ralphbsz · Jul 1, 2017

Eric A. Borisch said:
Has anyone seen any analysis of modern SSDs similar to Backblaze's HDD failure (and SMART indicator) analysis.

Yes, google for "Bianca Schroeder", "Arif Merchant", SSD and Google (this is not a joke, Arif and Bianca did the study using data from Google data centers). Bianca is the master (mistress?) of disk reliability, and Arif is likely the smartest guy in the storage industry. If I remember right, the conclusion is that SSDs are not perfect, and (just like with disks) the current rate of errors is a good predictor of the future rate of errors, and of eventual failure. The other fascinating thing is that accumulated write cycles is not as good a predictor of failure as one would have thought (we all thought that SSDs get worn out by writing and nothing else), but age is a reasonably good predictor (no, I don't remember why). What I completely forgot is whether their measured failure rate / MTBF / UBER matches the manufacturers specification or not, and how it compares with the known data for disk drives. When I see them, we tend to talk about personal stuff, not work.

No, I don't know anything that's like the Backblaze analysis, which tells you which brands to buy and which brands to avoid. Bianca and Arif are academics and researchers, and they need good relations with all vendors, and in their academic publications they can not explicitly praise one brand and dump on another. Backblaze doesn't have such restrictions.

In my personal experience: SSDs are different from disks. Yes, they occasionally have media errors, just like disks. Unlike disks, they tend to be perfect (many SSDs are perfect for their economic lifespan, and most SSDs are perfect when they are young), until they start failing; in contrast, even good new disks may have an occasional error (but most errors in a large system are contributed by a small fraction of bad disks). Like disks, once they start getting multiple errors, they will likely get more errors. Like disks, one of their favorite failure modes it totally bricking themselves (often with taking the bus down the cliff with them). Unlike disks, they don't have mature and useful SMART implementations (I've only used SCSI SMART, not SATA SMART); and some newer implementations (such as NVMe) don't seem to have useful SMART at all. Personally, I haven't seen value in watching wear-out and error counters on SSDs, but maybe I just didn't do it right. The performance of SSDs is completely weird, and in spite of some effort, using performance characterization to do predictive fault analysis doesn't seem to work on them (it works pretty good on disks).

On my server at home, I have two SSDs (both were very cheap, consumer-grade); I use one as the boot and root disk (the /home data is on separate devices), and the other is a backup that is updated occasionally. So if my main SSD croaks, I can be up and running within an hour (just diagnose the problem, remove the dead SSD, tell the BIOS to boot from the other one). But because my backup to the other SSD is rare, after such a failure, the root file system might be a day or a week out of date. That's when a whole evening of restoring backups would start, and hand-merging directories and files. Not fun, but given that the one SSD is unlikely to die, and given that everything outside the /home file system can be restored by performing a new install and redoing my customization, that's a risk I'm willing to take.

Eric A. Borisch · Jul 1, 2017

Thanks for the info. I found the paper:

https://www.usenix.org/system/files/conference/fast16/fast16-papers-schroeder.pdf

Looks like I have some bedtime reading!

Eric A. Borisch · Jul 2, 2017

Eric A. Borisch said:
Thanks for the info. I found the paper:

https://www.usenix.org/system/files/conference/fast16/fast16-papers-schroeder.pdf

Looks like I have some bedtime reading!

From the article:

• Between 20–63% of drives experience at least one uncorrectable error during their first four years in the field, making uncorrectable errors the most common non-transparent error in these drives. Between 2–6 out of 1,000 drive days are affected by them.

• The majority of drive days experience at least one correctable error, however other types of transparent errors, i.e. errors which the drive can mask from the user, are rare compared to non-transparent errors.
[...]
• Both RBER [raw (before ECC) bit error rate] and the number of uncorrectable errors grow with PE [drive read/write passes] cycles, however the rate of growth is slower than commonly expected, following a linear rather than exponential rate, and there are no sudden spikes once a drive exceeds the vendor’s PE cycle limit, within the PE cycle ranges we observe in the field.

Uncorrectable errors = errors in the data extending outside the drive, i.e. errors that ZFS will flag.
Transparent errors = errors that ZFS will not flag, but smartmontools can monitor, but they are rare with the exception of ECC-corrected errors:

In summary, we conclude that RBER is a poor predictor of UEs [uncorrectable errors]. This might imply that the failure mechanisms leading to RBER are different from those leading to UEs (e.g. retention errors in individual cells versus larger scale issues with the device).

They also find uncorrectable errors grow linearly, suggesting you could watch for uncorrectable errors and then replace when they occur.

So I return to my original suggestion of what one can do with just a single SSD+ZFS to improve reliability; use copies=2, watch for errors. It is certainly NOT a replacement for RAIDn redundancy, (and RAID is not a backup!), and it won't help if the drive goes completely belly-up (which doesn't seem to be common from the paper) but it will increase the likelihood of detecting errors on the drive before data loss.

ralphbsz · Jul 2, 2017

Eric A. Borisch said:
So I return to my original suggestion of what one can do with just a single SSD+ZFS to improve reliability; use copies=2, watch for errors. It is certainly NOT a replacement for RAIDn redundancy, (and RAID is not a backup!), ...

Seems right. But beware of the unintended side effect: with copies=2, you will also do twice as many writes (and an enormous number of seeks, which on an SSD don't kill performance like they do on spinning disks). So the drive will wear out faster (and whether that wear out is linear, sublinear, or exponential is a question of some debate, you quotes the answer from the Bianca Schroeder paper above, but "conventional wisdom" disagrees). Now you have to do a tradeoff: Is it more important to you to have good data reliability (the data is more likely to be readable with copies=2), or is it more important to you to have better system availability (fewer downtimes for drive replacement), or is it more important to save money and get the best performance (which probably doesn't matter to a home user with a single-drive system, but matters a lot to enterprise systems). And now it gets even worse: If doubling the write traffic increases the probability that the drive bricks itself, then copies=2 might even be counter-productive.

... and it won't help if the drive goes completely belly-up (which doesn't seem to be common from the paper) ...

I agree that the paper says that complete belly-up is not common, and I'm sure this is true in the data set they studied. However, my personal experience has been different: while SSDs going completely dead is not extremely common, it is by no means unheard of, and in a large population of deployed SSDs I have seen a considerable number of drives that went "rubber side up shiny side down" (old motorcycle joke, there are other descriptions of being dead which I can't quote in a family-friendly forum). One also has to remember that Bianca and Arif studied SSDs only in Google's production systems, and those are run differently from most other computer installations: they are very organized about purchasing, they carefully select drive firmware versions and upgrade them correctly, they may even run custom firmware in the drive (there have been research papers by Google staff about their modifications to the FTL in these drives), and they are extremely good about controlling environmental conditions, with the best possible compromise between cost, energy efficiency, and system availability. It is not 100% certain that the conclusions reached from this data set apply to the very confusing situation in the wider world.

Personally, I would not rely on a single SSD to be a reliable storage mechanism, not even with ZFS's copies=2 (even though on an SSD copies=2 is probably an improvement for hobbyist workloads). You said the same thing above: it does not replace RAID, which already does not replace backup.