UFS timeout on when accessing bad blocks of a SATA/AHCI disk

obsigna

Profile disabled
I received a bad disk for recovery, and I am trying to get of from it as much data as possible, by utilizing various methods.

Now it is very disturbing that for each bad block of 512 bytes the systems hangs 30 seconds until it hits the timeout.

Questions:
  • Where is this timeout defined - is it by the SATA firmware or somewhere in the OS?

  • Can I reduce the timeout to let's say 0.1 millisecond, or even better can I get the system (e.g. when using dd(1)) to simply ignore read/write errors -- at least in the course of the recovery process?

    tmux new "dd if=/dev/ada1 of=/dev/ada2 bs=512 conv=noerror,sync fillchar=U

  • Above dd stops for 30 seconds at every bad block, and for this reasons the ETC may be several days. Are there any other knobs which I may set to speed up the recovery process?
I know that the data may be recovered with inconsistencies, however, I will take care for this in a separate step.
 
Where is this timeout defined - is it by the SATA firmware or somewhere in the OS?
I don't really know, but does setting the kern.cam.ada.default_timeout sysctl (unit appears to be seconds with a default of 30 seconds) help at all?
 
obsigna
Maybe you want to try sysutils/safecopy
is a data recovery tool which tries to extract as much data as possible from a problematic (i.e. damaged sectors) source - like floppy drives, hard disk partitions, CDs, tape devices etc, ... , where other tools like dd would fail due to I/O errors.
 
tobik@, thank for the helpful hint. Yes, sysctl kern.cam.ada.default_timeout now turned out to be the knob which controls the timeout in the given respect. The minimum value is 1 second. I tried it with 0, but that brought the system down.

getopt, thank you, sysutils/safecopy is new to me. In the past, I tried the similar utility sysutils/ddrescue for recovering data. It worked quite well for scratched optical media, but it brought no additional benefits over dd when working with defective hard disks. My experience is that a bad sector of a HD stays bad, even if you read it 100 times. I will check whether sysutils/safecopy let me go further down with the timeout, without needing to remove all the multiply with 1000 operations in /usr/src/sys/cam/ata/ata_da.c + rebuild a custom kernel.
 
Back
Top