One FreeBSD 6.2-p11 system which has been running extremely stable for years has started to have a problem.
DELL PE750
CERC SATA RAID
2x250GB WD SATA HD
The OS sees one hard drive, aacd0
The system has crashed a couple of times a week for the last 2 weeks. Only responding to pings, answers some connections but hangs, and otherwise has to be soft/hard rebooted (and background fsck) to put it back into proper service.
I finally saw the console during one of these crashes. (Other times I simply had the system rebooted and followed up to find nothing of any value in the logs explaining the crash. The hd reported small corruptions that could just as easily been attributed to writing to open files when the crashes occured.)
Console reported (from my jotted down notes):
and was hung.
I rebooted into single user mode, did a proper fsck, and was able to return the machine to service. Before rebooting multi-user I checked the RAID BIOS report again (everything ok, and S.M.A.R.T. was "Y" or in good shape), I did not use the BIOS tool to verify media.
So I have no way to tell if this really was a hd issue, and if so which hard drive. I assumed that when one hard drive began to fail, I would get a report from the RAID device, swap in a new drive and resync. By the way the two drives are mirrored RAID0. Instead, RAID says everything is ok, and the OS says there's an i/o or swap issue when it hangs.
Thoughts and advice?
Thank you.
DELL PE750
CERC SATA RAID
2x250GB WD SATA HD
The OS sees one hard drive, aacd0
The system has crashed a couple of times a week for the last 2 weeks. Only responding to pings, answers some connections but hangs, and otherwise has to be soft/hard rebooted (and background fsck) to put it back into proper service.
I finally saw the console during one of these crashes. (Other times I simply had the system rebooted and followed up to find nothing of any value in the logs explaining the crash. The hd reported small corruptions that could just as easily been attributed to writing to open files when the crashes occured.)
Console reported (from my jotted down notes):
Code:
aacd0 hard error
vm_fault: pager read error
specifically reported that aacd0s1e had a write issue
specifically reported that aacd0s1f had a read issue
specifically reported that aacd0s1h had a read issue
and was hung.
I rebooted into single user mode, did a proper fsck, and was able to return the machine to service. Before rebooting multi-user I checked the RAID BIOS report again (everything ok, and S.M.A.R.T. was "Y" or in good shape), I did not use the BIOS tool to verify media.
So I have no way to tell if this really was a hd issue, and if so which hard drive. I assumed that when one hard drive began to fail, I would get a report from the RAID device, swap in a new drive and resync. By the way the two drives are mirrored RAID0. Instead, RAID says everything is ok, and the OS says there's an i/o or swap issue when it hangs.
Thoughts and advice?
Thank you.