I have a support request I have been updating on the pfSense forums but thought that the BSD forum may have better expertise in enabling differential diagnostic tools to help isolate the cause of my crashing issue. I know these problems are difficult to debug, but perhaps someone has dealt with something similar and can offer a quick solution or differential diagnostic suggestion to assist in further diagnosis.
I recently migrated my pfSense firewall from an ancient IBM x336 (100's of watts) to a Protectli FWB box (units of watts) and, aside from crashing, it is performant and cute and quiet, all good things. The crashing, however, not so much. It is running coreboot, which seems like a poorly considered choice at this point as an unwanted additional variable.
The symptoms are that the system becomes unresponsive via web, ssh, and console after some number of hours of operation. I believe it may be related to a pfBlockerNG update as I had a week plus of uptime before an update to that package and haven't had more than 27 hours since, but that could be a chimera. I have run a scrub and checked for disk errors (none reported) and while the system is live and remote and so difficult to properly sysutils/memtest86+, I did build sysutils/memtest and ran it on 4G (half the installed memory) and then 5G successfully with no errors reported, not conclusive but indicative of a good and compatible DIMM.
When the system faults, VPN connections hang, unbound becomes unresponsive, nginx stops responding, cron jobs don't seem to run, an ssh connection or console connection get a "user" prompt, but the console doesn't throw a password prompt, ssh will prompt but doesn't proceed after pw entry. The system responds to ping requests normally and routing and 1:1 NAT continue. Logging (apparently) stops at the moment of hang. No crashes or system errors are logged; logs just stop updating until reboot.
These little boxes don't come with IPMI style interfaces, alas, though I do have remote console via an Avocent. The problem is that sending a ctl-alt-del to the hung device yields only:
I'm not getting any core dumps (I've modified the
I'd be very grateful for any hints (aside from "never run critical infrastructure on consumer hardware") or additional diagnostic advice.
-David
I recently migrated my pfSense firewall from an ancient IBM x336 (100's of watts) to a Protectli FWB box (units of watts) and, aside from crashing, it is performant and cute and quiet, all good things. The crashing, however, not so much. It is running coreboot, which seems like a poorly considered choice at this point as an unwanted additional variable.
The symptoms are that the system becomes unresponsive via web, ssh, and console after some number of hours of operation. I believe it may be related to a pfBlockerNG update as I had a week plus of uptime before an update to that package and haven't had more than 27 hours since, but that could be a chimera. I have run a scrub and checked for disk errors (none reported) and while the system is live and remote and so difficult to properly sysutils/memtest86+, I did build sysutils/memtest and ran it on 4G (half the installed memory) and then 5G successfully with no errors reported, not conclusive but indicative of a good and compatible DIMM.
When the system faults, VPN connections hang, unbound becomes unresponsive, nginx stops responding, cron jobs don't seem to run, an ssh connection or console connection get a "user" prompt, but the console doesn't throw a password prompt, ssh will prompt but doesn't proceed after pw entry. The system responds to ping requests normally and routing and 1:1 NAT continue. Logging (apparently) stops at the moment of hang. No crashes or system errors are logged; logs just stop updating until reboot.
These little boxes don't come with IPMI style interfaces, alas, though I do have remote console via an Avocent. The problem is that sending a ctl-alt-del to the hung device yields only:
Code:
init 1 - - timeout expired for /etc/rc.shutdown: Interrupted system call; going to single user mode
init 1 - - some processes would not die; ps axl advised
I'm not getting any core dumps (I've modified the
sysctl
options to be a bit more vocal if possible (changes that haven't been tested yet).I'd be very grateful for any hints (aside from "never run critical infrastructure on consumer hardware") or additional diagnostic advice.
-David