LateNight said:
Thanks very much; smartmontools will do just what I was looking for.
In playing around with them, I noticed that one of my drives has a non-zero Reallocated_Event_Count and a couple of errors in its error log. I'm going to keep an eye on all my drives, using some smartctl commands in a cron script.
Even better, the port already has that function. It does not do it by default, you have to edit the config file in
/usr/local/etc and set the notification email address.
Just to satisfy my curiosity, is there any disadvantage to just doing something like dd bs=10M if=/dev/adX of=/dev/null
vs. smartctl -tlong /dev/adX
(besides taking longer and using some CPU and memory)? I assume both would result in checking all the checksums on the drive and would result in the same handling for any errors detected.
They are a bit different. Reading every block is not much of a test. It won't detect data corruption, but might detect block errors on the drive.
Writing to every block with
dd(1) might have errors, but if the drive corrects them by mapping them out to spare sectors, the only way to tell would be to check the SMART numbers afterwards. The SMART long test takes the same amount of time, but will report the results afterwards. They take the same amount of time if
dd(1) is given a buffer of at least 64k:
dd if=/dev/ada0 of=/dev/null bs=64k
. The SMART short and long tests are non-destructive. I have not really investigated how they work, but the results are consistent with manual tests.
But again, it's important to note that these are drive tests, not data tests. ZFS can detect data corruption. These tests check to see if the drive faithfully writes data. If some sector has become corrupted, it will not be noticed. The drive has no way of knowing what that data should have been.