I have a 7.1-RELEASE-p3 server running in a remote data center. I've had problems starting about 3 days ago in which the server is randomly rebooting (about once per day). Nothing shows up in /var/log/messages about the problem (only the normal boot-up messages). The `last` command does show that the server did in fact crash.
I'm trying to figure out what is causing this. I have the suspicion that it is a hardware problem, and I'd like to install a tool that lets me monitor the temperature and voltage of the processors and system. I've tried using Healthd, but after playing with it for a short period of time, I realized that it wasn't detecting anything (it said 0 temp, 0 volts, etc).
I was wondering if anyone can help me figure out how I can monitor the hardware of the server. Here's some info on the hardware:
Intel Xeon CPU 2.40GHz
PCI Devices:
I was reading up on various software I can use to do the monitoring, such as lmmon, mbmon, healthd, and ipmitool. From what I can tell, I will be required to recompile my kernel after adding a few options to it in order to have support for /dev/smb or something.
Also, will I need to enable ACPI? Right now it is disabled.
Thanks.
I'm trying to figure out what is causing this. I have the suspicion that it is a hardware problem, and I'd like to install a tool that lets me monitor the temperature and voltage of the processors and system. I've tried using Healthd, but after playing with it for a short period of time, I realized that it wasn't detecting anything (it said 0 temp, 0 volts, etc).
I was wondering if anyone can help me figure out how I can monitor the hardware of the server. Here's some info on the hardware:
Intel Xeon CPU 2.40GHz
PCI Devices:
- ATI Technologies Inc - Rage XL PCI
- Intel Corporation - 82540EM Gigabit Ethernet Controller
- Intel Corporation - 82801 Family (ICH2/3/4/4/5/5/6/7/8/9,63xxESB) Hub Interface to PCI Bridge
- Intel Corporation - 82801CA (ICH3) UltraATA/100 EIDE Controller
- Intel Corporation - 82801CA/CAM (ICH3-S/ICH3-M) LPC Interface
- Intel Corporation - 82801CA/CAM (ICH3-S/ICH3-M) SMBus Controller
- (2x) Intel Corporation - 82801CA/CAM (ICH3-S/ICH3-M) USB Controller
- Intel Corporation - E7500 System Controller (MCH, Hub Interface A) Error Reporter
- Intel Corporation - E7501 Host Controller
- ad0: WDC WD1600AAJB-00WRA0 58.01H58
I was reading up on various software I can use to do the monitoring, such as lmmon, mbmon, healthd, and ipmitool. From what I can tell, I will be required to recompile my kernel after adding a few options to it in order to have support for /dev/smb or something.
Also, will I need to enable ACPI? Right now it is disabled.
Thanks.