Solved Server doesn't answer to anything

The title may be strange but here it is.

I've got a home server on server hardware (Asus KGPE-D16 mobo, 2x Opteron 6262HE CPU, 56GB RAM, RAIDZ with 3 HDD's). A week ago on Friday it stopped responding to anything but pings. When investigating what shows on the console, I found a normal login screen, but after typing "root" and pressing Enter, the prompt went to the new line and waits. It didn't do anything else.

Now, a week later, I'm away from my server and will check on it later. Now it doesn't even respond to pings and I'm wondering what it may be.

SMART stats were OK the last time I checked (this week actually). Could it be memory? It's ECC memory but how do I check which module is the bad one?

The server doesn't have iKVM / iDRAC / IPMI / whatever you call it and of course I can't ssh in.

It's running FreeBSD 11.0-RELEASE/amd64 with latest patches from releng/11.0.
 
A week ago on Friday it stopped responding to anything but pings. When investigating what shows on the console, I found a normal login screen, but after typing "root" and pressing Enter, the prompt went to the new line and waits. It didn't do anything else.
That sounds like something is stuck in a disk wait, and so anything needing to be read from / written to disk stalls. Try the top(1) utility as mentioned below.
Now, a week later, I'm away from my server and will check on it later. Now it doesn't even respond to pings and I'm wondering what it may be.
On the system console, can you Alt-Fn to switch between virtual consoles? If so, interrupts and at least some drivers are still operating.

You could leave something like # top -S running and see if it keeps going after the system becomes un-respnsive. If it continues to work, look for processes in D state. If it stopped updating, then maybe whatever was on the screen when it died may be useful to determine what happened at the time of failure. You can use other utilities besides top(1) fot this - another good choice is systat(1), probably in the form of # systat -vmstat.
 
Thanks for some ideas. I'm currently at work so I need to wait around 10hrs until I get back :(
 
99.99% RAM PROBLEM!! Check your memory with some external tools like hierens.
Isn't it possible to do it from FreeBSD? I have lots of things running on my server, including my email :/

I see there are some memtest* ports: https://www.freshports.org/search.p...leted&start=1&casesensitivity=caseinsensitive

It would be great if it could run from userland.
Last week I also checked BIOS hardware logs - there were no entries and I've got ECC set to max level (scrubbing memory every 8hrs).
 
Back
Top