I recently installed a brand new 4TB SSD and after two week, I am getting ZFS error.
How should I test it? I presume smartmontools would hghlight any shortcomings.
What should I look at?
What other tools are worth considering?
Have you already solved this?
If not, I suggest doing a non-destructive read of all blocks on your SSD to assure there are at least no SSD read-errors, by running (as root):
dd if=/dev/ada0 bs=1m of=/dev/null
(edit /dev/ada0 above as necessary to be the SSD in question)
I am chiming-in here mostly because I just spent dozens of hours doing a deep-dive on a very insidious problem where my SSD RAID performance was gradually decreasing (measured several percent slower per day), eventually down to only
1-2% of normal speed, so slow that it was causing critical applications to crash. It turned-out that logging 1-second MB/s was the only way I found to see the nature and progression of this problem, even when SMART showed no errors or pre-fail indications. I was using an Areca 1883 and 8 x Samsung 870 QVO 8TB SSDs, and the solution was to replace the Samsungs with Micron enterprise-class SSDs, because at least 870 QVOs apparently do nothing to "refresh" marginal data due to tiny leaks of flash cell voltage over time, causing data to slowly-degrade, slowing later reads considerably and repeatedly. And FWIW, these SSDs had been used less than 10% of the lifetime spec endurance writes, so not anywhere close to their EOL.
So now when I test disks, I watch 1-second samples of the I/O speed by running something like this in another window while the above dd is running:
systat -iostat 1 -numbers -only ada0
(again editing ada0, if appropriate)
If the systat MB/s column regularly varies by more than about 20%, it might be a sign that your SSD is applying internal error-correction to recover some blocks that are "marginal" in terms of the flash cell voltages. New SATA SSDs usually perform consistently at >450 MB/sec when reading sequentially on an otherwise-idle system. It's possible to log 1-second data to a file, and graph it.
If your SSD can read all blocks and has consistent MB/sec, then at least you can develop confidence that it's not the SSD itself that is the cause of your ZFS error.