Freeze i can't debug - i can change VT, but nothing more.

I'm having some stability problems with my freebsd install.

Because of the nature of the freeze i don't really have that much info to give. I haven't found a way to trigger the freeze. It occurs randomly(as far as i can see), and my highest uptime since i started monitoring has been about a week. Sometimes it will freeze within a few minutes of booting.

Problem description:
- Computer will function nominally.
- I get disconnected from SSH/NFS/(all other network daemons).
- I can still ping the computer.
- On the computer i can use ctrl+alt+F? to change VT.
- I can't log in to any VT's.
- If "top" is running it will still update.
- There is no useful information on the screen.
- There is no evidence that the computer knows it has crashed in any way.
- There is no crash log. (I have to pull the plug).
- There is no response to ctrl+alt+del


Computer information:
Keyboard is PS/2.
No mouse is attached.
There is no X installed.

Computer is running latest STABLE.

I have recently cleaned the computer, and CPU temperature seems fine.

All my drives are running ZFS (root is mirrored, everything else is raidz).

I have done all the normal ZFS tuning, and i have enabled all the ZFS debug i could. But it didn't help at all.

I have tried for quite some time to find evidence that it is ZFS, but have failed in that pursuit.

I have never tried to debug a kernel before, and i don't really know where to start. I have seen the guide for recompiling the kernel, but i would hope there is a better/easier way.

dmesg: http://pastebin.ca/1429400 <- there is no zfs debug in this, since it didn't give me anything useful.

I don't know where to go from here. I don't know which part of the kernel i should enable debug information too, to get anything useful.

Sincerely
Tobias Ussing
 
I've tested the ram, no go.

Since I'm still fairly lost, but have gotten a hold of another computer, i think I'll mirror the OS drives to the new computer, and move some of the components slowly, to see which ones crash when.
 
Some random things to try:

1) Connecting over serial. Probably isn't going to make a difference, but it's worth a try.

2) Instead of watching top, bring up systat -io. Watch and see if this lock happens when you've got a lot of data going between drives. Typing ":numbers" after this is up makes the data a bit more readable in my opinion.

3) You may want to try running a current snapshot and see if this still happens.

4) Is there anything in messages?
 
Just an update. I haven't done the serial thing yet.

BUT, i got an entirely new server, mirrored everything over to it.

and the same thing happens on the new server, which means this is not a hardware issue. Which i had kinda hoped.
 
Back
Top