ZFS My ZFS SSD Experiences

cy@ · Oct 18, 2022

I replaced my laptop's 1 TB HGST spinning disk drive with a 1 TB Samsung 870 EVO in June. The boot partitions (yes three of them) are UFS and data on a large ZFS pool, with a rarely used 100 MB Windows partition.

I noticed the other day that even though the SSD is only about 3-4 months old smartmontools reports over 2.5 TB already written. This is on a laptop I do some git work on while the majority of my heavy lifting is done by my machines downstairs. 2.5 TB seemed excessive for 3-4 months of relatively light to moderate use.

Comparing two smartctl snapshots of S.M.A.R.T. attribute 241 over 24 hours I noticed close to 30 GB written. That's excessive over a 24 hour period.

Today, so far, 1.6 GB.

What did I do?

Two things: set sync=none and atime=off. Indeed sync=standard does limit the loss of data to the last write whereas sync=none might result in the loss of the last 100 writes. But, I'm not sure I'm willing to replace an SSD after only a year of use.

My ashift is already 12 -- zpool create did a good job of automatically selecting a large enough ashift based on the real LBA size rather than the size advertised by the SSD, though 13 might have been better -- Samsung hasn't published their blocksize.

ZFS compression will also reduce bytes written, though I already have LZ4 enabled. But those who don't should when using SSD/NVMe.

Since then I've re-enabled sync=standard and added a log device, a less expensive SD card, which can be replaced at the fraction of the cost.

(I think UFS would be a better general choice to reduce SSD/NVMe wear. The UFS journal only tracks deletes as softupdates will order writes to maintain consistency, except for deletes.)

So far so good.

If anyone else is noticing excessive SSD or NVMe wear using ZFS, I'd like to hear your stories too.

Phishfry · Oct 18, 2022

So I wonder about the numbers you are seeing.
If one of those partitions of yours contains Windows I think it would be worthy to compare Samsung Disk Magician numbers against smartmontools. My faith in smartctl only goes so far. It can be quirky.

cy@ · Oct 18, 2022

Phishfry said:
So I wonder about the numbers you are seeing.
If one of those partitions of yours contains Windows I think it would be worthy to compare Samsung Disk Magician numbers against smartmontools. My faith in smartctl only goes so far. It can be quirky.

I just downloaded Disk Magician last night. I haven't gotten around to booting Windows yet. I only use it to run income tax software in the spring. I'll install Disk Magician sometime over the next few days and report back.

The firmware updater, a Linux ISO distributed by Samsung, reported the firmware was already up to date.

covacat · Oct 18, 2022

with 1TB written / month and you will get 120TB in in ten years which is just 1/5 of the drive's TBW

ArgentoSoma · Oct 19, 2022

My case:

8152 hours / 24 hours = 339 days
TBW = 19,9 TB
In one day my disc principal only write 0,05 TB.

ralphbsz · Oct 19, 2022

While from a SSD lifetime viewpoint this is not catastropic (as covecat) said, it is still an annoyingly high write amplification factor.

Is there any way to estimate how much data you really write per day or month (from user space), and then compare that to the amount ZFS writes to the disk? I don't know how to do that, without running kernel tracing continuously, which is impractical.

VladiBG · Oct 19, 2022

Why did you choose ZFS?

cy@ · Oct 19, 2022

VladiBG said:
Why did you choose ZFS?

A long time ago I had a laptop with a small hard disk. ZFS with compression fixed that problem. Ever since, updating laptops was an exercise of shrink the Windows slice, create new FreeBSD slices, dump | restore the UFS filesystems and zfs send | zfs receive the zpool. Migration. And ZFS compression allows me to stuff more onto the disk than would otherwise be possible.

PMc · Oct 19, 2022

cy@ said:
I replaced my laptop's 1 TB HGST spinning disk drive with a 1 TB Samsung 870 EVO in June. The boot partitions (yes three of them) are UFS and data on a large ZFS pool, with a rarely used 100 MB Windows partition.

I noticed the other day that even though the SSD is only about 3-4 months old smartmontools reports over 2.5 TB already written.

So what's the problem? That device would be rated for 600 cycles, and with this load should then last 60 years (if not hitting silicon diffusion earlier). Something doesn't line up here...

On my desktop the disk usage is 300-400 GB/month - i.e. it will practically never consume even the cheapest QLC device. (/ and /usr as ufs, /usr/local and all changing data in the pool)

But, repeatedly switching branches on a big git repo (like src or ports) is a disk-intense operation - it rewrites almost all the files, and also invalidates the contents in the ARC and (if present) L2arc.

Over all, I would suggest vfs.zfs.txg.timeout=45 (or even bigger).
Also, I would set sync=none only on things where application content can easily be recovered (like git repos etc.), not on anything that might vaguely resemble a database (like the things in /var/db).

cy@ · Oct 20, 2022

PMc,

It would appear that at this rate, as alarming as it appears, because on my $JOB laptop (which is a 10 year old HP laptop) using NTFS, the cycle rate is on many orders of magnitude higher than it. The rate at which I'm burning through the SSD appears high, it should handle it for a good number of years.

However, what I've done so far has reduced the write rate by 2/3. Not nearly as low as the $JOB laptop but certainly better, and certainly a learning experience.

I don't do git repo work on that machine. Most of that is done on two server machines downstairs, each with mirrored ZFS pools. Both of which are also buildworld and poudriere servers.

Thank you for your suggestion. As stated in the first post of this thread sync=none was implemented though reverting and using a relatively inexpensive log device was a better solution.

Reducing writes by 2/3, I consider this a success.

Alain De Vos · Oct 20, 2022

I have "vfs.zfs.txg.timeout=5". Don't know if its better ?

hardworkingnewbie · Oct 20, 2022

Well there are lots of endurance tests around for SSDs, like this older one: https://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead/

30 GB/day on writes sounds at first like something worrying. In reality, I would not bother about it.

As the test clearly states, first SSD errors (not failures!) occured on consumer grade a Samung EVO 840 from 2015 after 300 TB of writes. This means when having 30 GB/writes per day, and you are using your computer every day, this would occur after roughly 27 years. And error does not mean that now the SSD is not functioning any longer, it will - and until failure there's still much more time left.

Deleted member 67440 · Nov 3, 2022

It is not very clear to me what exactly the concern is
That SSD costs around 100 euros

The first 32GB SSD that I had shipped from Korea in 2007 cost me over 500
The 32GB Intel / Kingston SLC drives from 2009 (which, incidentally, continue to perform flawlessly in 2022) were also over $ 600

I don't see the slightest reason to adopt "weird" things like an SD or whatever.
Add only complexity and fragility

SSDs typically run at 4KB pages
If you write one byte, you will actually write 4096
Plus there's the COW issue, zfs metadata and so on
I personally scrub my servers with zfs every night, and do nothing more
Whether they are magnetic, SSD or NVMe (and ramdisk. Yes, I scrub ramdisks as well)

Short version: relax

ZFS My ZFS SSD Experiences

Deleted member 67440

Guest