1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Server Shutting Down With No Logs

Discussion in 'System Hardware' started by user1, Dec 6, 2012.

  1. user1

    user1 New Member

    Messages:
    17
    Thanks Received:
    1
    Hello, I was hoping some one could give fresh ideas to troubleshoot this error.

    At work a server has been shutting down over night consistently for the past couple days. The error logs show nothing about the shutdown which leads us to believe it is faulty hardware.

    We have tried swapping the power supply, resetting bios, plugging into a new ups, cleaning the CPU re-applying fresh thermal grease, cleaning the ram and putting it into a new slot and we have visually inspected all the capacitors etc for any noticeable damage. None of these have resolved the issue.


    Any fresh ideas would be greatly appreciated.

    Thanks in advance.
     
  2. wblock@

    wblock@ Administrator Staff Member Administrator Moderator Developer

    Messages:
    11,712
    Thanks Received:
    2,271
    A description of the hardware and software would be useful.
     
  3. user1

    user1 New Member

    Messages:
    17
    Thanks Received:
    1
    I don't know much information about the specific hardware (manufacturers names etc.) but the server is running FreeBSD 8.2. It is a mail server and prior to having issues it had been running without issues for 380+ days.
     
  4. SirDice

    SirDice Moderator Staff Member Moderator

    Messages:
    17,628
    Thanks Received:
    2,385
    I'm just shooting in the dark here but 8.2 is End-of-Life and with a 380+ days uptime I'm guessing nobody installed any security patches.
     
  5. user1

    user1 New Member

    Messages:
    17
    Thanks Received:
    1
    I do not manage the server, so I wouldn't be able to give you a correct answer. The administrator is thinking hardware issues do you think it could be some kind of security issue?

    Are there any other hardware issues than the ones listed that you can think of off the top of your head? We are open to all suggestions.
     
  6. SirDice

    SirDice Moderator Staff Member Moderator

    Messages:
    17,628
    Thanks Received:
    2,385
    The only hardware issue that would cause a sudden shutdown is overheating. Or your power company isn't supplying a 'clean' signal and the power supply simply shuts down.

    Oh, and I've had a case where a server inexplicably went down around the same time every day. This turned out to be the cleaning lady that unplugged the machine so she could use the socket for her vacuum. Seriously, this happened.

    But besides that, yes, an unmaintained and unpatched machine on the internet? That's just asking for it.
     
    user1 thanks for this.
  7. user1

    user1 New Member

    Messages:
    17
    Thanks Received:
    1
    Okay thank you for your input. I'll keep you posted as we work through it.
     
  8. wblock@

    wblock@ Administrator Staff Member Administrator Moderator Developer

    Messages:
    11,712
    Thanks Received:
    2,271
    Does it shut down the same time every day, like when a particular cron(8) job runs?
     
    user1 thanks for this.
  9. wblock@

    wblock@ Administrator Staff Member Administrator Moderator Developer

    Messages:
    11,712
    Thanks Received:
    2,271
    Or bad memory causes a panic. Or a sudden increase in usage drives marginal components into the failure zone.
     
  10. user1

    user1 New Member

    Messages:
    17
    Thanks Received:
    1
    We are unsure of the exact shut down time. It happens over night and the servers are not monitored at night. When we come into the office and check the server it is shutdown. I suggested running a memory test (memtest) and checking the cpu temps (sysctl dev.cpu.0.temperature)/(sysctl -a | grep tempe). I will ask about the cron jobs thank you for the suggestion.

    The external power was a concern a week or two ago and we mentioned it to the power company. The external power is monitored and doesn't seem to abnormal. There are quite a few servers running and none of the others had issues similar to this one.

    One of the other servers did have a bad HDD close to the time this server started having problems. Seems unrelated but wanted to note it.
     
  11. wblock@

    wblock@ Administrator Staff Member Administrator Moderator Developer

    Messages:
    11,712
    Thanks Received:
    2,271
    Hard drives often fail in clusters.

    To find the time of reset, a cron job could be added that just mails an "I'm alive" message once an hour or more.
     
  12. SirDice

    SirDice Moderator Staff Member Moderator

    Messages:
    17,628
    Thanks Received:
    2,385
    Wouldn't that leave traces in /var/log/messages? At the very least a crash dump in /var/crash/.

    User1, also check the BIOS. There's usually a setting for when the power goes out and back on again. Most servers have the option for "off", "on" or "last state". If it's a power fluctuation and it turns off at least it should turn back on again when the power is good.
     
  13. wblock@

    wblock@ Administrator Staff Member Administrator Moderator Developer

    Messages:
    11,712
    Thanks Received:
    2,271
    Maybe, depends on the failure mode. Seems like I've also heard of CPU cache going bad.
     
  14. gkontos

    gkontos Active Member

    Messages:
    1,395
    Thanks Received:
    246
    The /var/log/messages will give you a lot of information like:
    • If this was a clean shutdown or not.
    • The time that this occurred.

    Also, during the night periodic scripts run which can stress faulty hardware.
     
  15. user1

    user1 New Member

    Messages:
    17
    Thanks Received:
    1
    This is still an ongoing issue, the server was off this morning when we came in. The network administrator is going to check through the cron jobs but he did not seem to concerned about them I don't that much is run on that server over night.

    I'll keep every one posted, hopefully we can figure out what the problem is soon.
     
  16. gkontos

    gkontos Active Member

    Messages:
    1,395
    Thanks Received:
    246
    Why do you think that a "Network Administrator" will be able to solve this problem for you?

    Do you think that this is related to a network issue?

    If the Network Administrator in not concerned about the periodic scripts then maybe you need to find a System Administrator.

    I am being very honest and brute because your approach is really a recipe for disaster. Your topic suggests that your server which is running an non-patched and EOL Operating System is shutting down overnight without any errors in the logs.
    You were asked to provide more information about this system but you can't because you obviously don't know how to. So, how can you be so sure that there are is nothing in the logs that may give you a clue on where to start looking for the problem?
     
  17. tingo

    tingo Member

    Messages:
    988
    Thanks Received:
    97
    Don't forget last(1). It can also tell you why the machine was rebooted.
     
  18. user1

    user1 New Member

    Messages:
    17
    Thanks Received:
    1
    Allow me to clarify the situation to avoid any confusion. Where I work there is a network/system admin who is in charge of the entire network and all the servers. He is troubleshooting the server, I am completely confident he will solve the issue but I am looking to help him solve it faster.

    Also in my post further down I mentioned I do not know the status of patches and would not be able to provide valid information on whether it is patched or not.

    As far as the logs I am going off what the administrator told me. I'm sure he is competent enough to search the proper logs for errors.

    I am new to networking and working with servers and I am hoping some one on here (I know there are very experienced administrators on this site) would be able to give me advice to troubleshoot this problem with the limited information that is available to me.
     
  19. gkontos

    gkontos Active Member

    Messages:
    1,395
    Thanks Received:
    246
    It is very difficult to find people with psychic abilities in a technical forum.
     
  20. Terry_Kennedy

    Terry_Kennedy Member

    Messages:
    574
    Thanks Received:
    94
    I'd suggest setting up a serial console and capturing that output with another PC. If the system prints something to the console and then reboots, you'll know what the problem is. If nothing is printed and the system reboots, you have a hardware problem.

    Neither the built-in VGA console nor a remote viewer for the console (via server hardware management) will help, as these don't record what has scrolled off the screen. You need to capture the console output on another system.

    Some system failures intentionally don't log things to the local disk (for example, if the disk drops offline there's no disk to log to), and crash dumps have been problematic for years (the mechanisms involved are not entirely SMP / thread / interrupt safe, so you often get a double panic and no useful crash data).
     
  21. phoenix

    phoenix Moderator Staff Member Moderator

    Messages:
    3,450
    Thanks Received:
    770
    Then add some monitoring! Seriously. If you don't know when it's shutting down, then you need to add some logging to find out. Even something as simple as the following in root's crontab:
    Code:
    * * * * * /bin/date >> /var/log/time.log


    Then you can open the file after booting, and find out when it shutdown.
     
  22. Anonymous

    Anonymous Guest

    I guess you meant date(1)(), as time(1)() may be not exactly as useful in the given respect. In addition, it is recommended to use the full path for everything in the crontab, i.e.:

    Code:
    *       *       *       *       *       root    /bin/date >> /var/log/time.log
     
    phoenix thanks for this.
  23. gkontos

    gkontos Active Member

    Messages:
    1,395
    Thanks Received:
    246
    Mercy, mercy !!!

    A simple look at /var/log/messages will tell you EXACTLY when did a server rebooted!!!
     
  24. Anonymous

    Anonymous Guest

    Hmm...

    This time is more or less known already, i.e. once the admin presses the power button in the morning, after finding the server being off.

     
  25. gkontos

    gkontos Active Member

    Messages:
    1,395
    Thanks Received:
    246
    Even in that case /var/log/cron should give them an estimate.