How to determine if SSD is faulty

I recently installed a brand new 4TB SSD and after two week, I am getting ZFS error.

How should I test it? I presume smartmontools would hghlight any shortcomings.

What should I look at?

What other tools are worth considering?
 
I recently installed a brand new 4TB SSD and after two week, I am getting ZFS error.

How should I test it? I presume smartmontools would hghlight any shortcomings.

What should I look at?

What other tools are worth considering?
The only reliable way to determine true SSD health status is to use manufacturer provided diagnostic tools. Unfortunately, in many cases, that requires Windows. All on one diagnostic tools can give false positive results and they should be avoided. They just read S.M.A.R.T status, and thats not always reliable when it comes to SSDs.
 
I am getting ZFS error.
We can not help debug "ZFS error". We can help debug specific error messages, usually also requiring a description of who/what/where/when and all those forensic questions.

Most of the error reports I see on the forum these days are of the form "I was driving my bicycle, and intentionally put a stick into my front wheel, and went flying and fell on my head. Now my head hurts, and I can't remember anything about the bicycle and its hardware problems".

There is an old and very cruel joke about computer tech support asking whether you saved the packaging material for your computer when a user reports on a PEBKAC. That joke is more and more often applicable.

I presume smartmontools would hghlight any shortcomings.
That presumes that the problem is the SSD hardware or its internal firmware, and that anyone here knows how to decode the information. More often than not, those preconditions are not met. And to be honest, as far as storage device (spinning HDD and SSD) health is concerned, SMART is not very often a good way to diagnose it. There was a very well known paper about 20 years ago, from Google people (not me) who were studying disk drive failures, and their summary was (very simplified) that half the time, when SMART predicts disk failure, nothing goes wrong; and half the time, when a disk fails, SMART didn't predict any failure.
 
I recently installed a brand new 4TB SSD and after two week, I am getting ZFS error.

How should I test it? I presume smartmontools would hghlight any shortcomings.

What should I look at?

What other tools are worth considering?

Have you already solved this?

If not, I suggest doing a non-destructive read of all blocks on your SSD to assure there are at least no SSD read-errors, by running (as root):

dd if=/dev/ada0 bs=1m of=/dev/null

(edit /dev/ada0 above as necessary to be the SSD in question)

I am chiming-in here mostly because I just spent dozens of hours doing a deep-dive on a very insidious problem where my SSD RAID performance was gradually decreasing (measured several percent slower per day), eventually down to only 1-2% of normal speed, so slow that it was causing critical applications to crash. It turned-out that logging 1-second MB/s was the only way I found to see the nature and progression of this problem, even when SMART showed no errors or pre-fail indications. I was using an Areca 1883 and 8 x Samsung 870 QVO 8TB SSDs, and the solution was to replace the Samsungs with Micron enterprise-class SSDs, because at least 870 QVOs apparently do nothing to "refresh" marginal data due to tiny leaks of flash cell voltage over time, causing data to slowly-degrade, slowing later reads considerably and repeatedly. And FWIW, these SSDs had been used less than 10% of the lifetime spec endurance writes, so not anywhere close to their EOL.

So now when I test disks, I watch 1-second samples of the I/O speed by running something like this in another window while the above dd is running:

systat -iostat 1 -numbers -only ada0

(again editing ada0, if appropriate)

If the systat MB/s column regularly varies by more than about 20%, it might be a sign that your SSD is applying internal error-correction to recover some blocks that are "marginal" in terms of the flash cell voltages. New SATA SSDs usually perform consistently at >450 MB/sec when reading sequentially on an otherwise-idle system. It's possible to log 1-second data to a file, and graph it.

If your SSD can read all blocks and has consistent MB/sec, then at least you can develop confidence that it's not the SSD itself that is the cause of your ZFS error.
 
Have you already solved this?

I had decided the the disk was fscked, that's why I marked the topic as solved.

I would try your suggestion but I can't get any system to recognise its presence.

My ThinkPad W520 just hangs when I turn it on after displaying the initial screen.

Same with my ThinkPad X61, although after five mutes an error msg popped up saying 'intalization error (3)' !

I need to get some response from AliExpress to get a refund or a replacement.

This thing claims to have a three year warranty, but I haven't even had it for three weeks.
 
Have you already solved this?

If not, I suggest doing a non-destructive read of all blocks on your SSD to assure there are at least no SSD read-errors, by running (as root):

dd if=/dev/ada0 bs=1m of=/dev/null

(edit /dev/ada0 above as necessary to be the SSD in question)

I am chiming-in here mostly because I just spent dozens of hours doing a deep-dive on a very insidious problem where my SSD RAID performance was gradually decreasing (measured several percent slower per day), eventually down to only 1-2% of normal speed, so slow that it was causing critical applications to crash. It turned-out that logging 1-second MB/s was the only way I found to see the nature and progression of this problem, even when SMART showed no errors or pre-fail indications. I was using an Areca 1883 and 8 x Samsung 870 QVO 8TB SSDs, and the solution was to replace the Samsungs with Micron enterprise-class SSDs, because at least 870 QVOs apparently do nothing to "refresh" marginal data due to tiny leaks of flash cell voltage over time, causing data to slowly-degrade, slowing later reads considerably and repeatedly. And FWIW, these SSDs had been used less than 10% of the lifetime spec endurance writes, so not anywhere close to their EOL.

So now when I test disks, I watch 1-second samples of the I/O speed by running something like this in another window while the above dd is running:

systat -iostat 1 -numbers -only ada0

(again editing ada0, if appropriate)

If the systat MB/s column regularly varies by more than about 20%, it might be a sign that your SSD is applying internal error-correction to recover some blocks that are "marginal" in terms of the flash cell voltages. New SATA SSDs usually perform consistently at >450 MB/sec when reading sequentially on an otherwise-idle system. It's possible to log 1-second data to a file, and graph it.

If your SSD can read all blocks and has consistent MB/sec, then at least you can develop confidence that it's not the SSD itself that is the cause of your ZFS error.
Buying Samsung QVO in the first place was the mistake you made. This applies to other vendors using QLC technology. If you do care about data integrity, go for vendors that use SLC nand chips.
 
You did not specify it was an nvme disk , an ssd disk , or a rotating disk.


boot from an usb stick image

run

gpart show -p
gpart show -l
dmesg | grep ada
nvmecontrol devlist
geom part list | egrep "label|Name"
smartctl -i /dev/ada...X | grep Solid
 
Back
Top