Solved How to periodically check if the hard disk has bad sectors for UFS?

Hello everyone,

It is recommended that pools be scrubbed regularly for ZFS. But what is recommended for UFS?
If it is fsck, is it necessary to periodically execute fsck manually? Assume that RAID1 is used.

Thanks.
 
ZFS scrub is a bit more than you seem to think, it's both a check for bad sectors that will trigger an I/O error when read but also a check for silent corruption even if there are no bad sectors on the physical storage media.

For UFS you can't use fsck(8) for that because it really won't do a thing to check for data integrity, it only checks that the filesystem administrative metadata is in good order. You could try to run a periodic dd if=/dev/<partition> of=/dev/null to read trough the whole partition that contains the filesystem and see if any errors occur.
 
  • Thanks
Reactions: sdf
This is a complex question; kpa already answered some aspects of it. Right now I'm busy, so not a long answer, but think of it in layers: at what layer are you trying to find problems? The disk drive internals, the interface between disk and host, the file system metadata, or uncorrected/undetected corruption of the file system data? The answer of what is sensible and what works is different for all these layers.
 
One approach was to do a dd if=/dev/X of=/dev/X to read the disc and rewrite to the same sectors. This would combat weakening magnetic flux and correct errors the drive could fix so they don't accumulate. I never tried that, and it will screw up any mounted file systems for sure.
 
S.M.A.R.T. enabled disks allow self-testing. There are 2 sorts of self tests. The short self test verifies the disk electronics and mechanics, and the extended self test looks in addtion for bad sectors by reading and verifying the whole disk. Both tests may be run in background and foreground mode. The background mode does not interfere with the usual disk operation and can be executed on disks with mounted file systems.

The software package named sysutils/smartmontools may be used for executing the tests and for monitoring various aspects of the disk's health. The package come with two tools, smartctl(1) and smartd(8).

smartctl -a /dev/ada0 shows the SMART status of the disk with device id ada0.
smartctl -t long /dev/ada0 would start the extended self test in background mode

For example, we could run the extended self test in background mode by a cron job every Sunday morning at 2 am. A short test takes usually 1 minute, the extended test may last several hours, depending on disk size and speed.

smartd can be used for monitoring the disk status and logging events on a permanent basis. It could even send e-mails when failures were detected.
 
  • Thanks
Reactions: sdf
See periodic(8) and /etc/defaults/periodic.conf, there probably has what you want. Then you can activate what you want in /etc/periodic.conf or /etc/periodic.conf.local.
 
  • Thanks
Reactions: sdf
I run some of my UFS disks on top of geli(8) using only authentication. You'll be warned whenever a corrupted read is encountered. You could periodically run a dd as kpa suggests to verify the integrity. It needs to be the lowest layer.
 
You could try to run a periodic dd if=/dev/<partition> of=/dev/null to read trough the whole partition that contains the filesystem and see if any errors occur.
Thank you. Is it a common practice for UFS to periodically use dd to check if a disk has bad sectors?
 
Best practice is to use the smartmon tools from a cron job, as was mentioned above.
 
  • Thanks
Reactions: sdf
Maybe you can try sending a tar archive of the filesystem to /dev/null; this way tar(1) will read all existing files: tar cf - --one-file-system /path/to/filesystem >/dev/null. Then keep an eye for anything suspicious, written to the stderr.
 
As a related aside, I was working on a linux box today with hardware RAID1. Mail had stopped working at 3am. Nagios reported it couldn't find /usr/bin/mail. Indeed when I looked it was called nail :eek:.

That's bit rot right there and a great example of what RAID1 is not designed to protect against.

I'll be resurrecting it as FreeBSD box ;)
 
As a related aside, I was working on a linux box today with hardware RAID1. Mail had stopped working at 3am. Nagios reported it couldn't find /usr/bin/mail. Indeed when I looked it was called nail :eek:.

That's bit rot right there and a great example of what RAID1 is not designed to protect against.

I'll be resurrecting it as FreeBSD box ;)
Can't the hardware RAID scrub the drives to find such things? Hmmm...with RAID1 there is no checksumming so maybe not. But bitrot would occur on one drive in the mirror, not both, so it should be detectable even if not automatically correctable.

I use UFS on my NAS (which has hardware RAID) but the configuration is RAID6 and the controller scrubs the array on a weekly basis (usually runs for about a day). In more than 4 years (10 4TB drives) there have been no incidents. If there has been any bitrot it has been automatically corrected.

I also vote for smartctl to ensure the proper behavior of drives. Usually works well. I use it on all my mechanical drives, including the aforementioned NAS, and it has given me early warning of problems on a number of occasions.
 
RAID1 does't read from both drives and compare the result. That's not it's purpose, although I agree it would be a nice feature.

RAID1 can be even more insidious, I've encountered this twice now:

Drive 0 develops a surface area fault on some random sector, maybe one that's not in use by the filing system or rarely ever read, but the controller cannot read anything from the surface at that spot. Later drive 1 fails completely and you replace it. The controller begins copying from drive 0 to drive 1, but because there's an unreadable sector it will never complete, it enters an endless loop trying.

Now you install two new disks and restore from backup.
 
  • Thanks
Reactions: sdf
RAID1 does't read from both drives and compare the result. That's not it's purpose, although I agree it would be a nice feature.
RAID implementations that keep track of checksums of the data (such as ZFS) end up de-facto doing this: they will read from the second drive if the checksum from the first drive is incorrect.
Some RAID implementations also "scrub" the data regularly: they read everything (in the case of RAID1 both copies), and make sure that (a) it is readable, (b) it matches checksums, and (c) it matches each other.
Cheap/bad RAID implementations don't do any of this.
Drive 0 develops a surface area fault on some random sector, maybe one that's not in use by the filing system or rarely ever read, but the controller cannot read anything from the surface at that spot. Later drive 1 fails completely and you replace it. The controller begins copying from drive 0 to drive 1, but because there's an unreadable sector it will never complete, it enters an endless loop trying.
It turns out that with today's large drives, the scenario you describe is the largest source of data loss (other than human error by administrators, and software bugs in the RAID implementation, which are far worse than hardware problems): While having to read the last surviving copy of some data, one finds a sector error. With today's drives, the probability of finding a sector error when completely reading a disk is rapidly approaching 1 (take 10TB x 8 bits/byte x 10^-14 probablity of read error per bit). For this reason, if you really care about the survivability of your data, you need multiple redundancy. Or to quote the (now retired) CTO of NetApp: Anyone who sells single-fault-tolerant RAID is committing professional malpractice.

Anyway, a good RAID implementation would not go into an endless loop. It would instead end up skipping over the damaged section, mark the "copy" as unreadable and defective, and finish the copy. if anyone later reads the unreadable part, they would get EIO, but only then. Unfortunately, low-end RAID doesn't work that way.
 
Some RAID implementations also "scrub" the data regularly: they read everything (in the case of RAID1 both copies), and make sure that (a) it is readable, (b) it matches checksums, and (c) it matches each other.

Yes, this is what I was referring to. I would expect a hardware RAID controller to scrub the mirror. Given that there is no checksum, when an error is found it cannot necessarily be resolved, though the controller could then dig down and establish the existence of a bad block on one drive, and therefore infer the right answer. But even if the problem could not be solved, it could be flagged.

Shrug. Maybe I expect too much.
 
although I agree it would be a nice feature.
Sorry to disagree with you, but there's no way to determine which disk contains the correct data. All you can determine from this is that one of the disks has bad data at that sector.

ralphbsz I totally agree with you. I don't know if LSI is considered low end, but the Dell implementation of their hardware is my only experience as described. Considering that line of servers (now retired) only came with 2 disks in the first place, I'd hazard a guess the 'selling point' of RAID 1 was the motivation.
 
The assumption that "low-end" RAID has to make is that disks are fail-fast: A disk is either working and working perfectly, or a disk is completely failed and will not return any data. Fundamentally, this legislates away disks returning "wrong" data. Fortunately, this assumption is mostly true: the common failure modes of disk involve either the whole disk going away entirely, or the disk being able to detect when it has a data error (for example with the error-correcting codes that are stored on the platter) and returning an error instead.

The reason I wrote "wrong" in quotes above is that it is awfully hard to define in practice what "right" is. There is a theoretical algorithmic definition, which involves single-copy serializability and tracking the most recent write to the address. In practice, the (real world existing) problem of off-track writes makes this difficult: If data is written while the drive is mechanically being vibrated, you may end up with two copies of the track next to each other on the platter. Future small reads can return either one or the other track, in both cases without detecting errors (if the reads are short enough and you get lucky/unlucky with seeking). Both sectors that are returned are "right" in the sense that both were actually written at some point in the past; one is more recent, but which one is de-facto impossible to find out at the time of reading. This means that the disk becomes byzantine: It will sometimes return different (but valid-looking) data for a specific sector.

In addition, RAID1 if implemented carelessly already has a byzantine read problem: If the two copies on the two disks ever diverge (which typically happens during error handling), future reads will return one or the other copy. Which makes whoever is the consumer of the data that comes from the RAID (typically a file system) pretty unhappy.

The reason I wrote "low-end RAID" in quotes above is that to me, this is the definition of low-end: RAID systems that don't keep checksums and timestamps on a sector-by-sector basis. Unfortunately, calculating checksums takes a lot of CPU resources, and storing checksums and timestamps for each data block requires not only a lot of extra storage, but also very fast extra storage (in practice, can only done with flash assisting the disk, if you want high performance and fault tolerance). Life is hard.

In the meantime, low-end RAID is definitely better than nothing: It handles 99% or 99.99% of all failures; the fact that certain byzantine behaviors don't get fixed shouldn't stop anyone from buying it. Now, if you can use better RAID (like the thing that's built into ZFS), that's obviously preferable.
 
Also keep in mind that flash is not the answer. Flash does not store your data but only a statistical approximation of your data.
 
Back
Top