Solved Why did my system stop responding?

I had to power off a headless system yesterday because it wasn't even responding to ping().

Is there any way to tell, when I switch it on, hopefully with sshd() working, why it stopped responding? I guess dmesg might tell me something, or even /var/log/messages... Is there anything else I should look at?
 
I always subscribe to headless machines running some form of auditing. I know it's after the event, but some time spent reading & comprehending openbsm/auditd will help enormously if this sudden loss of connection happens again. (You could use auditd but I think openbsm is more feature-rich and is being expanded whereas, i think, auditd is not. ymmv)

To your issue: yes that's about all you've got. Trawl through /var/log/, basically.
 
dmesg is only available if it is saved in /var/log/... files, so look there.

If the kernel crashed, you might have a crash dump which you could save and analyze (for example with a debugger). Problem with that: it is difficult, and often you find another symptom of the same cause (for example resource starvation), without knowing the cause.

Suggestion: Start a little script that runs something like vmstat, iostat, ps aux or top every 5-10 seconds and appends it to a log file.
 
Obviousy one cannot say if your network is up (you gave us no ping stats or ifconfig), if ssh /etc/ssh.conf or /usr/local/etc/ssh.conf might have been replaced, if "run_sshd=YES" is in your rc.conf still, if kldload needed was not loaded.

I can say what was the old guru trick in the 1990's linux days using telnet.

encapsulate whatever you do in a script which can restore your connection if your connection is shut off by the changes. it's simply what you have to do if you really cannot access the remote machine your dealing with. hopefull there are "more modern ways" IDK.

sshd is not your "most reliable way to connect" since there are many more things that can change and halt your connection. rshd and telnetd are "more reliable". even so - you must have networking up and network equivalence and routes.

Upgrading your OS is not the "easiest way to retain control over resets" (ie, watchdog'ing)!

Perhaps you need a network card which has a netbios chip (many older? pc sold cheap cards with the remote network card bios chip removed) or to rely on the "stability" of UEIFI boot loaders to activate a PXE (whichare not quick to set up and last time I tried I found at the end of (hrs) that I needed a proprietary .bin that was not available)

IN SHORT: if you have access to the machine don't spend a ton of time. just access the machine to fix it.
 
sshd is not your "most reliable way to connect" since there are many more things that can change and halt your connection. rshd and telnetd are "more reliable". even so - you must have networking up and network equivalence and routes.
Basically, if ping doesn't work you are stuffed AFAICS. The system was working fine most of the day but then it just stopped. No idea why. I guess I'll try doing something like ralphbsz suggested...
 
IN SHORT: if you have access to the machine don't spend a ton of time. just access the machine to fix it.
And this is why some old-fashioned data centers had "portable heads": a little lab cart with a VGA monitor and a keyboard, for doing diagnostics and maintenance. Then we went to the era of VGA cards that were on the ISA bus, and pretended to be VGA cards, but in reality connected to ethernet, and were remotely accessible. There was a cheap one called something like the "Network Weasel", which I used. IBM built a super-complex one, which internally had an ASIC and a PowerPC chip (but worked really well). These days, server-grade hardware has network-based access right built in. But most amateur setups don't even use that, since there is a significant barrier to entry.
 
...Then we went to the era of VGA cards that were on the ISA bus, and pretended to be VGA cards, but in reality connected to ethernet, and were remotely accessible. There was a cheap one called something like the "Network Weasel"...
The PC Weasel 2000! I so wanted one of those, but was never able to convince a boss to pay for one.
 
And this is why some old-fashioned data centers had "portable heads": a little lab cart with a VGA monitor and a keyboard, for doing diagnostics and maintenance. Then we went to the era of VGA cards that were on the ISA bus, and pretended to be VGA cards, but in reality connected to ethernet, and were remotely accessible. There was a cheap one called something like the "Network Weasel", which I used. IBM built a super-complex one, which internally had an ASIC and a PowerPC chip (but worked really well). These days, server-grade hardware has network-based access right built in. But most amateur setups don't even use that, since there is a significant barrier to entry.
Yesterday that system stopped responding again, so I attached my 'super-complex' cheap 14" TV and discovered that 're0' was down. After a reboot, all was well again.... for a few hours... but now the NIC, a Realtek RTL8111 seems to have given up the ghost altogether. This is the only system I have which doesn't have an Intel NIC... very odd being a Lenovo...

I guess the only way to rescue it is to insert a USB NIC.
 
Back
Top