ad0 failure read_dma

Hello,

I recently installed FreeBSD 8.1. I got everything working perfectly. Networking, Xorg (sound and flash), samba, my WIFI, etc. I was happy and felt like I accomplished my server migration task. Then all of a sudden the box crashed. I did a hard reset because I was unable to remotely access the box via ssh.

I rebooted and saw the following type errors:

Code:
ad0 failure read_dma
I do not remember the exact trailing errors but it did say "uncorrectable".

My question is, could this be related to my initial install? In "sysinstall" it asked about "geometry". It said for older BIOS/PCs select no, which I did.

Has anyone seen problems like this before?

I booted into single user mode. I tried to run some commands and got some I/O errors.

I'm planning on getting a new hard drive and starting over. :( Before I install again, I am going to try one thing. I will boot into safe mode and set:

Code:
hw.ata.ata_dma="0"
If that works, I will back up some data and copy some configuration parameters and install a new hard drive.

Anyone seen this before?


Thanks.
 
I am planning on going to the store after work. If the cable was bad wouldn't my second hard drive show the same type of errors?? I only see the error on ad0 and not ad1.

Wondering if I should just get a new cable as well, in addition to a hard drive.

I was hoping that disabling DMA would do the trick, can't hurt to try. :)
 
manilaboy1vic said:
I am planning on going to the store after work. If the cable was bad wouldn't my second hard drive show the same type of errors?? I only see the error on ad0 and not ad1.

Probably. The cable could fail in multiple ways. And we didn't know you had an ad1 until just now

Wondering if I should just get a new cable as well, in addition to a hard drive.

I was hoping that disabling DMA would do the trick, can't hurt to try. :)

A new retail IDE drive should come with an 80-wire cable.
 
You may be able to get the exact error with
% dmesg -a | grep READ_DMA

It is probably
Code:
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=[b]sector[/b]
or similar. I think it means your disk has a bad sector there.

You should see a non-null number in the rightmost column (raw value) when you run
% smartctl -a /dev/ad0 | grep Reallocated_Sector
 
Beastie said:
You may be able to get the exact error with
% dmesg -a | grep READ_DMA

It is probably
Code:
ad0: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=[b]sector[/b]
or similar. I think it means your disk has a bad sector there.

You should see a non-null number in the rightmost column (raw value) when you run
% smartctl -a /dev/ad0 | grep Reallocated_Sector


It won't boot. I can only do things in single user mode. I'm going to try the safe mode DMA thing, if that fails, new hard drive and start over. :(

I am also seeing errors like:

Code:
init: can't exec getty '/usr/libexec/getty' for port /dev/ttyv0
That is not exact but very similar.

Not sure what error is not letting me boot up.
 
Looks like I have another command to try in single user mode to try and salvage this hard drive regarding the getty trouble:

Code:
cp /usr/src/etc/rc.d/* /etc/rc.d
 
Just to add my personal experience to this very interesting thread.

I have seen just the same error (error read_dma lba) when booting a FreeBSD 7.3 RELEASE, which has been very stable for months until this problem. The boot sequence always aborted when trying to mount the root FS (when trying to load a library). Since I didn't have a backup of the root partition, I try to backup various config files but some of them trigger the same error and were thus inaccessible for reading.

I boot the live FreeBSD CD and run various SMART tests on the drive (a 40GB ATA-100 Western Digital manufactured in 2003). Short and extended tests all PASSED. I replaced the IDE cable. I even run the Western Digital HD diagnostic utilities suite to test the drive. Everything seems ok but the problem (read_dma error) was still there.

So I reinstalled FreeBSD on the same drive after a low-level format. BAD MOVE!! A few hours after installation, the same error came back again..

Finally, I did what I should have done at the very beginning : remove the drive and replace it with a new drive.

Everything is working perfectly since that time.
 
hansivers said:
Short and extended tests all PASSED.
This does not mean much really. I guess the software would only report the test as "FAILED" if some attribute values crossed the corresponding threshold.
The best move is to check and interpret those individual attributes.
 
Back
Top