Other How to rescue data from a failing disk

balanga · Mar 15, 2020

I have an old disk which sounds as though it is nearing end of life.
gpart() shows:

Code:

=>      63  78140097  da0  MBR  (37G)
        63     15057    3  !22  (7.4M)
     15120  39054960    2  ntfs  (19G)
  39070080      1664       - free -  (832K)
  39071744  39068416    1  !23  [active]  (19G)

How should I attempt to rescue any data from the disk? I suspect dd() would fail, so should I attempt to mount the various partition and copy any accessible files? Are there any recovery tools I should try to recover bad blocks?

Running smartctl -a --device=ata /dev/da0 fails with:-

Read Device Identity failed: Inappropriate ioctl for device

Not sure what that means...

wolffnx · Mar 15, 2020

sorry to say this, the are many tools to use, but if the filesystem is Ntfs
use some Windows system and use some tools to recover like power data recovery, and then run a chksdk to fix the disk(if not too late)

PMc · Mar 15, 2020

dd has a noerror option that should ignore read errors and a sync option that should fill the missing blocks with NUL bytes. While this should work, it would be recommended to run it with the default blocksize (512) to not loose more blocks than actually damaged - and that may take quite long.(*)
Furthermore, I would not trust dd fully with that noerror/sync thing, so I would copy these partitions into some other partitions where they then can be mounted and checked for consistency.

Running smartctl -a --device=ata /dev/da0 fails with:-

If this is really a SCSI disk, it may work with smartctl -a --device=scsi /dev/da0

(*) If you're good at math, you could also read the whole disk with a large blocksize, see where the errors are, then fetch the data around the errors with a smaller blocksize, and then arrange all the chunks properly together. (The seek and skip options of dd work on disks and on files. I never needed anything else than dd to fetch what was remaining from defective disks.)

TracyTiger · Mar 15, 2020

On failing disks I've had good results in the past using ddrescue. That was on Linux systems but hopefully you would get similarly good results with FreeBSD.

More details on ddrescue may be found at gnu.org.

tingo · Mar 16, 2020

both sysutils/dd_rescue and sysutils/ddrescue exists. I have used one of them in the past, with great results. I think it was dd_rescue.

balanga · Mar 16, 2020

According to dd():-

EXAMPLES
Check that a disk drive contains no bad blocks:

dd if=/dev/ada0 of=/dev/null bs=1m

Not sure what happens if there is a bad block, but I ran the above command and didn't see any errors, so am not sure if the disk is OK...

ralphbsz · Mar 16, 2020

balanga said:
Not sure what happens if there is a bad block, but I ran the above command and didn't see any errors, so am not sure if the disk is OK...

Three options (there are probably a lot more):

The block is only a little bit bad on disk, and retries within the disk drive were able to read it. The way to find out about that is to ask the kernel to display information messages from SCSI disks, or to ask the disk (for example via SMART data), or to study the performance of individual IOs.
The block is actually unreadable, and the disk knows it and handles it gracefully, by returning an error. You will probably see details of the error in the system log (dmesg or /var/log/messages), in particular for SCSI spinning disks, which return pretty accurate ASC/ASCQ. The error will be returned to the program (like dd), which will either crash, or print the error and continue, or just continue. The vanilla dd in the stock configuration will print an error message.
The block is unreadable, but the system doesn't handle it gracefully. I've seen extreme examples in both FreeBSD and Linux, for example a disk with a bad block that causes FreeBSD to not boot and hang if that disk if physically attached to SATA, or IOs that get an error at the disk hardware, but the kernel never processes the error return and the IO remains hanging (my personal record was waiting >96 hours for a disk IO to complete, before I lost patience and rebooted the system). Or the OS tries error handling badly, causing more and more recovery IOs to create a perfect storm, and the system becomes unresponsive (without actually crashing). Debugging this situation is hard, and usually you have to use IO or kernel tracing to figure out where the problem happened (by know what the last successful IO was).

The various dd rescue implementations help with the middle problem. Solutions to that can also be quickly hand-coded; it takes about 20 lines of C to write a little program that performs a sequence of read() calls, and on error prints sensible messages and continues. But that does help with the first and last problem.

balanga · Mar 16, 2020

ralphbsz, I just wish I understood 10% of what you know! I'm more of a belt and braces man myself...

Re dd(), I've been checking around 20 of my laptop disks today to see what state they are in and just came across the first error:-

dd: /dev/ada1: Input/output error
5784+0 records in
5784+0 records out
6064963584 bytes transferred in 240.761695 secs (25190733 bytes/sec)

This is on a 112G disk. I guess it gives me a chance to try out sysutils/dd_rescue