Solved FreeBSD 12.1 halts completely and randomly

So I am using FreeBSD 12.1 on an old 2011 i7-E610 based lab PC to run a PBX. It has a Digium TDM400P telephone interface card using DAHDI from ports and Asterisk from ports. The system is perfectly stable and everything works fine, the card shows no obvious problems, and this state lasts for anywhere from a few hours to a couple days. After which point, the system will without logging any details anywhere, completely and utterly halt. Video and network goes dead, there is nothing in /var/log/messages, and I most recently hooked up a serial TTY to the machine to capture anything that might have been pasted to console, and "syslogd: /dev/console: Interrupted system call" was all I got, which I couldn't find any relevant mention of anywhere on google. After this message the system accepts no input. The Digium card is also disabled when this happens.

It is as if the CPU has, all of a sudden, stopped executing. I have no other details to provide, as I'm not entirely sure what I could even collect to gain a better insight into what is going wrong. Any help or advice, even if it's telling me to flick a variable somewhere and wait for some better logged something would be helpful.
 
Sounds like power supply problems or CPU overheating. Dust, humidity or a bad power supply can accompany CPU overheating.

One thing that takes up a lot of power is if there's a flash drive constantly connected to the PC. Mine crashed, then, I got a boot error message that I had to look up that was about a connected flash-drive. So i disconnected that, and it ran a lot better since.

Switching the hard drives to softupdates, rather than journalizing updates, using tunefs(8) also helped. To adjust the root directory, you need to boot with the install cd to change these settings for root.

I did a lot for it before, like clean out dust, reseat the heatsink with new thermal gel, add lubricant to the fans, replace the fans, clean the PSU carefully as dangerous shock can happen from the capacitors, to buying a new PSU.
 
I thought about that, but it can't be dust, it's a passively cooled machine, and while it could in theory be overheating, the machine can be instantly restarted and run for hours more without any overheating. The PSU Is also an external self-contained power brick, and the machine is supplied by DC power with an acceptable range of 9-36VDC, so while it's possible for anything to go bad, it wouldn't be my first guess. There is also no flash drive, and the machine does not seem to crash in any way handled by the kernel.
 
It might be some power saving function of the BIOS, have you looked at those? Make sure that's turned off.
 
Aye I do need to check that out, particularly since it's mobile hardware.

Actually, maybe I should turn off ACPI in general, as when I was running FreePBX's Linux distro (which completely broke in every single other feasible way) it acted as if the suspend button was always pressed down.
 
It might be some power saving function of the BIOS, have you looked at those? Make sure that's turned off.

I was having the same issue with 11.3 on my Ryzan 1600 machine, and a few weeks ago I did make a change to a BIOS setting, based on a few discussions I read online. I changed a setting from "low current idle" (I think it was) to "Typical current idle". I understand that turns off C6 states, whatever that means. So far so good.

Note there is another thread about having ahcich timeouts, and I was having that too. So turns out I think I was having two issues.
 
I've since solved the issue, and replaced the original hardware. The problem was the Digium card was producing far more heat than expected, and in a machine with no active cooling at all, it was causing components in the computer other than the CPU to overheat, thus leading to system halting.
 
Back
Top