Watchdog not working

This is actually not really a FreeBSD question, since the OS is doing everything right.

I have a Jetway NF99FL motherboard, running a home server. And like most computers, it occasionally hangs, or in this particular case regularly gets a kernel panic at 1AM. For some reason, it doesn't reboot (which is good, I can see the stack trace).

So I thought: enable the watchdog timer, then the machine will at least reboot. Should work fine, the motherboard has a watchdog built into the Intel ICH chip, and there is a matching kernel module ichwd for it. No problem, enable the watchdog in the BIOS, and set it for 10 minute timeout (I'd rather have to wait a little longer than have false alarms). Enable the watchdogd in /etc/rc.conf.

Only problem: doesn't work. When the kernel module loads, it complains:
Code:
ichwd0: ICH WDT present but disabled in BIOS or hardware
Which obviously then prevents watchdogd from running. And the error message is actually correct: even though the watchdog is enabled in the BIOS, and there is no watchdog daemon running, the machine has been staying up for hours, so it doesn't seem to really work.

Anyone have a sensible idea about what to do? I find it very unlikely that upgrading FreeBSD will make any difference, as the problem seems to be in hardware. The only thing I can think of is to buy a new motherboard, but that is way too much work and $$$ for such a tiny problem. If I seriously wanted to spend a day or two on watchdog support (and getting a new motherboard will be a day or two of work), I would wire something involving a relay, a 555 chip, and the parallel port.
 
No idea about the watchdog part but are your 1AM kernel panics the same time when the periodic daily entry files for your timezone? Some of the disk intensive stuff may not play nice with your hardware issues. You may be able to at least narrow down why it panics but running all the daily and security scripts one by one.
 
Actually, I have a pretty good guess why it panics. The kernel trace says it is a double fault. It only happens within half an hour after starting a full ZFS scrub, so it is probably a ZFS bug. There is no other activity at that time.

In the meantime, I found two interesting watchdo monitors that only require userland programming via USB, and can be hidden inside the case: http://www.j-works.com/wdt205.php and http://www.berkprod.com/Product_Web_Pag ... chdog.aspx
 
Back
Top