Crashing every hour (almost to the second)

I have 8.2 amd 64 installed. All was stable until I decided to reload my perl port and picked the old version by mistake. After a couple of frustrating hours, perl was reinstalled and using the various port tools, plus some hard work to pick up modules that the ports system missed, my software all worked correctly.

However, since then the system crashes every hour (there are no cron jobs involved, and I have waited before restarting to see if time of day affects the issue: it doesn't). The error is an ad5 DMA (unable to write error 5). No dump is made,the server says it is unable to do it.

This sounds like it should be a disk problem. Ad5 is not a disk on my machine. There is no ad5 entry in /dev. When the problem first occurred, I thought it might be my disk. I was using two SSDs with gmirror(8) in a RAID 1 config. I tried removing a disk, same problem. I physically substituted another disk. No change. fsck(8) is clean.

I realize this isn't necessarily the OS; though it might be. The problem is that I can't come up with a way to understand the problems. No logs offer anything helpful. The system dies due to the DMA timeouts. That much I am pretty sure is true. One of the port maintainers went through my system and repeated the perl installs. The problem continues and no one has any idea why. I searched on every term I can think of with no success.

I would appreciate any help I can get here.

Thanks,
Bob
 
BobWalter said:
I have 8.2 amd 64 installed. All was stable until I decided to reload my perl port and picked the old version by mistake. After a couple of frustrating hours, perl was reinstalled and using the various port tools, plus some hard work to pick up modules that the ports system missed, my software all worked correctly.

However, since then the system crashes every hour (there are no cron jobs involved, and I have waited before restarting to see if time of day affects the issue: it doesn't). The error is an ad5 dma (unable to write error 5). No dump is made,the server says it is unable to do it.

This sounds like it should be a disk problem. Ad5 is not a disk on my machine. There is no ad5 entry in /dev. When the problem first occured, I thought it might be my disk. I was using two SSHD's with gmirror in a RAID 1 config. I tried removing a disk, same problem. I physically substituted another disk. No change. fsck is clean.

I realize this isn't necessarily the OS; though it might be. The problem is that I can't come up with a way to understand the problems. No logs offer anything helpful. The system dies due to the dma timeouts. That much I am pretty sure is true. One of the port maintainers went through my system and repeated the perl installs. The problem continues and no one has any idea why. I searched on every term I can think of with no success.

I would appreciate any help I can get here.

Thanks,
Bob

Have you verified ram?
I once hand random crashes (one every few hours) when I had one bad bit in RAM
 
It might be the controller, the cable or the disk itself that's on the fritz. Try installing sysutils/smartmontools and see if the disk is OK. Replacing the cable should be easy to try too.
 
I did check the disk. In fact, since I had a raid 1, I used both disks and tried both controllers. Disks and controllers are fine. The exact time between crashes makes me think this is software.
 
I have never heard of software making DMA errors get reported. Other than maybe a driver issue, but perl wouldnt cause it. So I think its a coincidence.
 
It happens to me after installation of some ports. While installing java/eclipse at some point I was not able to install it and its dependencies, the system crash at random. Nothing in logs, only about 10-15 lines on screen when not in X but not able to read, system instantly reboot.

I thought it was OpenJDK 6, in post installation message there are instructons to mount fdesc and proc, I was not add them to fstab. Before that I was installed emulators/fuse as gvfs dependency and an USB disk (NTFS) is always plugged in but used only in Windows (multiboot system).

After adding fdesc and proc in /etc/fstab and unplugging USB disk, the system stop rebooting. I need more investigation but hope this helps you in some way.
 
Back
Top