Hard disk verify program?

I don't know if this is the right place to ask this question, but here goes. It does relate to Hard Drives and Storage... And please note: I am NOT a programmer! This may be a simple programming task, but I don't know where to start. I do all of the hardware stuff here.

We duplicate a Master Data 2TB or 3TB hard drive onto (many) new drives to be installed into multiple systems. We have been getting reports of errors on these Target drives.

I need a "MEMTEST" kind of a program that will boot up a Test System and run Disk Checks on all the UFS hard drives installed other than the boot drive. I have a 12 bay system. This could even be a USB drive program (like the MEMTEST86 that I already use) that simply boots up, and gives me a menu to test, log and save results for the 12 Target data drives that were just created on our (third party commercial) hard drive duplicator system.

I am hoping there may already be such a test tool available and you can point me in the right direction.

Thanks in advance for any assistance you might be able to provide.

Dale
 
dkline201 said:
We duplicate a Master Data 2TB or 3TB hard drive onto (many) new drives to be installed into multiple systems. We have been getting reports of errors on these Target drives.
Two questions:

1) How do you do the duplication? If you're doing a block-by-block copy of a disk with mounted filesystems (for example, your system disk), you'll get (as a minimum) warnings when you boot the target drive on a new system saying that the filesystem(s) were not cleanly dismounted.

2) What sort of errors - logical (for example, an unexpected soft update inconsistency) or physical (for example, unrecoverable read errors)?

If you're getting physical disk errors, then you should use the manufacturer's diagnostics to verify that the drive is error-free. sysutils/smartmontools is handy, but replies on asking the drive "how are you feeling?", not actually doing an independent verification.

If you have logical disk errors, look at your duplication method and also for hardware problems (bad memory, bad cables, insufficient power supply, and so on).
 
Agreed on the block copy. dd(1) gets used for this a lot, but is usually not the right tool.

The short and long SMART tests really do test the drive. The nice thing about them is they run on the drive, so do not tie up the computer. Tests can be started on multiple drives all at once.
 
wblock@ said:
The short and long SMART tests really do test the drive. The nice thing about them is they run on the drive, so do not tie up the computer. Tests can be started on multiple drives all at once.
The SMART tests ask the drive to test itself and post the results somewhere the host can retrieve them. I'd believe any results that say the drive is broken (false positives) but would be suspicious of any results that say the drive is fine (false negatives) if I had any reason to be concerned about the drive.

On the other hand, SMART tests may be more comprehensive as the drive has access to all of its sectors, including spares and the reserved-for-own-use space (which normally holds most of the firmware). But you're relying on the drive manufacturer to provide decent test coverage without inducing (or ignoring) errors. Since there's at least one known instance of an innocuous SMART command causing bad sectors, this isn't necessarily a given...

I've had "enterprise" drives from a well-known manufacturer pass the long offline test even though they are also showing offline uncorrectable sectors in SMART.
 
Well it's basic troubleshooting 101. If you get no indication of trouble that's still not proof that everything is OK. If you get some indication of trouble, let[red]'[/red]s say a failed smartmontools self test on a drive, it is very likely that something is really wrong.

You have to resist the urge to jump in to conclusions in absence of evidence.
 
Thanks for all of the answers

We do use a professional third party multi-disk duplicator, and it looks like the DMA problem is on the Master disk, therefore was passed on to the target drives. We have not located the cause for the error, but some of the responses were very helpful in learning a lot more about the disk system and troubleshooting tools.

The drives show the
Code:
ata4: FAILURE - oversized DMA transfer attempt
error in FBSD FreeBSD 8.1. Under FBSD FreeBSD 9.1 the error does not show up. (We are in the process of converting to 9.1 now.) But we still have to support a lot of 8.1 systems until they update to 9.1.

Thanks for all the replies.
 
postscript to previous

By the way, fsck and SMART tools show NO errors on the drive. Nor does fsck correct the error. It must have been something that happened in creating the master data drive that 8.1 and DMA don't like.

But we also have instances where a drive that went out without the error, does come back with the error. That's the part that we cannot duplicate in house, or figure out how it occurs.

These are essentially "read only" drives. We put LOTS of data files on a big 2 TB drive, that rarely gets written to, except for an occasional update. And even when the error shows, there does not seem to be any corrupted files. The error only shows up during the boot process.
 
oversized DMA transfer attempt does not sound like a drive hardware error, and certainly not a filesystem error. It would have me looking for updated firmware for the disk controller first, then maybe looking for a sysctl(8) to limit DMA transfer sizes.

But if it does not happen in 9.1, that suggests a bug in the old ata(4).
 
Back
Top