freeze on hp proliant 320g5p with e200i sas raid

Hi all,

I am using 7.1-RELEASE on HP ProLiant DL320G5p. I couldn't rely on integrated SATA 'raid' so i expanded it with SmartArray E200i raid controller. This server runs only base system without any ports installed directly on it. However, i run three jails on it (DNS, squid and postfix) set up according to instructions from handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/jails-application.html

The problem is that this server freezes once a week or so. It just disappears from the network. It appears to send output to monitor, and no errors are displayed on console. It does not accept inputs from ps2 keyboard, and of course i can not ping it or ssh to it. After reboot nothing can be found in logs (or maybe i don't know the right places to look :))

Any help would be appretiated...
 
This is typically a harware error and you won't get any info from the OS, unless there's a preceding condition that leads to it (like an interrupt storm).
It is hard to debug and usually involves checking thermal paste, heatsink if overheating is a possible cause and swapping out things like memory and PCI cards, look for leaking capacitors on motherboard, disabling or removing USB if conflicts are a concern and running on different OS to determine if FreeBSD might be configuring hardware incorrectly that leads to this condition.

I don't pity you, I've had to deal with all of the above causes and it's never easy to locate the specific problem, especially if days go by before the problem reappears.
 
Are you using custom kernel?
GENERIC kernel have support for textdump(4); such feature simplify debugging crash problems.

But OS freezing is much harder to handle. So if there is anything configured (powerd, hald , ...) beyond default options it may help to debug problem more.

You can look for more info in /usr/share/doc/en/books directory, most useful is Developers handbook with sections abut debugging kernel crashes.

Freezes are not always hardware problems.
 
Thank you for your quick reply.

Yes, it is stressful to wait undetermined period of time until machine freezes hoping to get some information, while when that happens the only thing you can do is to restart the machine and everything is back to normal with nothing in system log.

However, i found that system is not completely freezed. When i press power button, it tries to send shutdown signal and throws that to console, but it does not succeed. This does not mean much to me, but i guess it is a proof that system still responds to something.

I will try with different OS, perhaps Debian, to see if things get any better but i would really like to have this server on FreeBSD as 5 of my other servers (squid children) run it. Interesting thing is that one of those squids run on identical machine, bought at the same time from the same company, and has 9 months of uptime without any problem.
 
Hi after some time.

Current situation: Server happily runs base system and 4 jails for more than a month now.

I thought it was not relevant to mention that the server was housed at our ISP. But when i went to pick the server up from them for further testing, i already knew that was THE reason for system hangs.
Server was not even placed in a rack; it was just put onto a few other desktops. Cooling in their 'server room' was insufficient. Server was not properly grounded. etc. etc.

So i can conclude that reason for those hangs were neither hardware nor software related, but improper handling.

Thanx for help anyway and sorry for bothering...
 
Finally I found what was causing the problem. In order to enable ILO which shares physical port with ethernet port I have had enabled ASF feature without RTFM (man bge):

Code:
hw.bge.allow_asf
             Allow the ASF feature for cooperating with IPMI.  Can cause sys-
             tem lockup problems on a small number of systems.  Disabled by
             default.

After removing the line from loader.conf and disabbling ILO it never froze again.
 
Back
Top