dch
Developer
			
		My system started hanging a few weeks ago, and I'm suspecting hardware problems. It's a hard hang, the whole system freezes but never reboots. This post is about finding a way to trigger a reboot if the system hangs, not the actual problem itself!
My mainboard has both IPMI, and a BIOS enabled hardware watchdog feature, which seems to be set around 5 minutes mark. I've not yet found how to inform the BIOS watchdog that the system is running, so I turned that off as 5 minutes of uptime is not my thing.
There's ichwd() and watchdogd() which in theory should be sufficient:
	
	
	
		
That's an ominous error message!
After loading these drivers, dmesg reports:
	
	
	
		
As an alternative, the sysutils/freeipmi port has bmc-watchdog() which reports:
	
	
	
		
which appears to be completely unrelated, but might do the job if it communicates with the BMC?
I will try this next, update thread as I go - the hang can be hours away so it could take some time.
				
			My mainboard has both IPMI, and a BIOS enabled hardware watchdog feature, which seems to be set around 5 minutes mark. I've not yet found how to inform the BIOS watchdog that the system is running, so I turned that off as 5 minutes of uptime is not my thing.
There's ichwd() and watchdogd() which in theory should be sufficient:
		Code:
	
	# kldload ipmi
# kldload ichwd
# ls /dev/fido
/dev/fido
# watchdogd -d
watchdogd: mlockall failed: Cannot allocate memory
...That's an ominous error message!
After loading these drivers, dmesg reports:
		Code:
	
	[23] ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0
[23] ipmi0: KCS mode found at io 0xca2 on acpi
[23] ipmi0: IPMI device rev. 1, firmware rev. 3.45, version 2.0, device support mask 0xbf
[23] ipmi0: Number of channels 2
[23] ipmi0: Attached watchdog
[23] ipmi0: Establishing power cycle handler
[25] ipmi1 failed to probe on isa0
[45] ichwd0: <Intel Wellsburg watchdog timer> on isa0As an alternative, the sysutils/freeipmi port has bmc-watchdog() which reports:
		Code:
	
	root@wintermute /u/h/dch# bmc-watchdog --get
Timer Use:                   SMS/OS
Timer:                       Stopped
Logging:                     Enabled
Timeout Action:              None
Pre-Timeout Interrupt:       None
Pre-Timeout Interval:        0 seconds
Timer Use BIOS FRB2 Flag:    Clear
Timer Use BIOS POST Flag:    Clear
Timer Use BIOS OS Load Flag: Clear
Timer Use BIOS SMS/OS Flag:  Clear
Timer Use BIOS OEM Flag:     Clear
Initial Countdown:           0 seconds
Current Countdown:           0 secondswhich appears to be completely unrelated, but might do the job if it communicates with the BMC?
I will try this next, update thread as I go - the hang can be hours away so it could take some time.
 
			     
 
		 
 
		