262c7 [Solved] Server Shutting Down With No Logs - The FreeBSD Forums
The FreeBSD Forums  

Go Back   The FreeBSD Forums > Base System > System Hardware

System Hardware Internal storage, motherboards, PCI cards, stuff inside the case.

Reply
 
Thread Tools Display Modes
  #1  
Old December 6th, 2012, 15:38
user1 user1 is offline
Junior Member
 
Join Date: Nov 2012
Posts: 17
Thanks: 5
Thanked 1 Time in 1 Post
Default Server Shutting Down With No Logs

Hello, I was hoping some one could give fresh ideas to troubleshoot this error.

At work a server has been shutting down over night consistently for the past couple days. The error logs show nothing about the shutdown which leads us to believe it is faulty hardware.

We have tried swapping the power supply, resetting bios, plugging into a new ups, cleaning the CPU re-applying fresh thermal grease, cleaning the ram and putting it into a new slot and we have visually inspected all the capacitors etc for any noticeable damage. None of these have resolved the issue.


Any fresh ideas would be greatly appreciated.

Thanks in advance.
Reply With Quote
  #2  
Old December 6th, 2012, 16:02
wblock@'s Avatar
wblock@ wblock@ is offline
Moderator
 
Join Date: Sep 2009
Location: Milky Way galaxy
Posts: 7,720
Thanks: 432
Thanked 1,760 Times in 1,457 Posts
Default

A description of the hardware and software would be useful.
Reply With Quote
  #3  
Old December 6th, 2012, 16:10
user1 user1 is offline
Junior Member
 
Join Date: Nov 2012
Posts: 17
Thanks: 5
Thanked 1 Time in 1 Post
Default

I don't know much information about the specific hardware (manufacturers names etc.) but the server is running FreeBSD 8.2. It is a mail server and prior to having issues it had been running without issues for 380+ days.
Reply With Quote
  #4  
Old December 6th, 2012, 16:13
SirDice's Avatar
SirDice SirDice is offline
Moderator
 
Join Date: Nov 2008
Location: Rotterdam, Netherlands
Posts: 13,725
Thanks: 47
Thanked 2,023 Times in 1,862 Posts
Default

I'm just shooting in the dark here but 8.2 is End-of-Life and with a 380+ days uptime I'm guessing nobody installed any security patches.
__________________
Senior UNIX Engineer at Unix Support Nederland
Experience is something you don't get until just after you need it.
Reply With Quote
  #5  
Old December 6th, 2012, 16:17
user1 user1 is offline
Junior Member
 
Join Date: Nov 2012
Posts: 17
Thanks: 5
Thanked 1 Time in 1 Post
Default

Quote:
Originally Posted by SirDice View Post
I'm just shooting in the dark here but 8.2 is End-of-Life and with a 380+ days uptime I'm guessing nobody installed any security patches.
I do not manage the server, so I wouldn't be able to give you a correct answer. The administrator is thinking hardware issues do you think it could be some kind of security issue?

Are there any other hardware issues than the ones listed that you can think of off the top of your head? We are open to all suggestions.
Reply With Quote
  #6  
Old December 6th, 2012, 16:41
SirDice's Avatar
SirDice SirDice is offline
Moderator
 
Join Date: Nov 2008
Location: Rotterdam, Netherlands
Posts: 13,725
Thanks: 47
Thanked 2,023 Times in 1,862 Posts
Default

Quote:
Originally Posted by user1 View Post
The administrator is thinking hardware issues do you think it could be some kind of security issue?
The only hardware issue that would cause a sudden shutdown is overheating. Or your power company isn't supplying a 'clean' signal and the power supply simply shuts down.

Oh, and I've had a case where a server inexplicably went down around the same time every day. This turned out to be the cleaning lady that unplugged the machine so she could use the socket for her vacuum. Seriously, this happened.

But besides that, yes, an unmaintained and unpatched machine on the internet? That's just asking for it.
__________________
Senior UNIX Engineer at Unix Support Nederland
Experience is something you don't get until just after you need it.
Reply With Quote
The Following User Says Thank You to SirDice For This Useful Post:
user1 (December 6th, 2012)
  #7  
Old December 6th, 2012, 16:42
user1 user1 is offline
Junior Member
 
Join Date: Nov 2012
Posts: 17
Thanks: 5
Thanked 1 Time in 1 Post
Default

Okay thank you for your input. I'll keep you posted as we work through it.
Reply With Quote
  #8  
Old December 6th, 2012, 18:02
wblock@'s Avatar
wblock@ wblock@ is offline
Moderator
 
Join Date: Sep 2009
Location: Milky Way galaxy
Posts: 7,720
Thanks: 432
Thanked 1,760 Times in 1,457 Posts
Default

Does it shut down the same time every day, like when a particular cron(8) job runs?
Reply With Quote
The Following User Says Thank You to wblock@ For This Useful Post:
user1 (December 6th, 2012)
  #9  
Old December 6th, 2012, 18:04
wblock@'s Avatar
wblock@ wblock@ is offline
Moderator
 
Join Date: Sep 2009
Location: Milky Way galaxy
Posts: 7,720
Thanks: 432
Thanked 1,760 Times in 1,457 Posts
Default

Quote:
Originally Posted by SirDice View Post
The only hardware issue that would cause a sudden shutdown is overheating. Or your power company isn't supplying a 'clean' signal and the power supply simply shuts down.
Or bad memory causes a panic. Or a sudden increase in usage drives marginal components into the failure zone.
Reply With Quote
  #10  
Old December 6th, 2012, 19:00
user1 user1 is offline
Junior Member
 
Join Date: Nov 2012
Posts: 17
Thanks: 5
Thanked 1 Time in 1 Post
Default

Quote:
Originally Posted by wblock@ View Post
Does it shut down the same time every day, like when a particular cron(8) job runs?
We are unsure of the exact shut down time. It happens over night and the servers are not monitored at night. When we come into the office and check the server it is shutdown. I suggested running a memory test (memtest) and checking the cpu temps (sysctl dev.cpu.0.temperature)/(sysctl -a | grep tempe). I will ask about the cron jobs thank you for the suggestion.

The external power was a concern a week or two ago and we mentioned it to the power company. The external power is monitored and doesn't seem to abnormal. There are quite a few servers running and none of the others had issues similar to this one.

One of the other servers did have a bad HDD close to the time this server started having problems. Seems unrelated but wanted to note it.
Reply With Quote
  #11  
Old December 6th, 2012, 19:04
wblock@'s Avatar
wblock@ wblock@ is offline
Moderator
 
Join Date: Sep 2009
Location: Milky Way galaxy
Posts: 7,720
Thanks: 432
Thanked 1,760 Times in 1,457 Posts
Default

Hard drives often fail in clusters.

To find the time of reset, a cron job could be added that just mails an "I'm alive" message once an hour or more.
Reply With Quote
  #12  
Old December 7th, 2012, 09:33
SirDice's Avatar
SirDice SirDice is offline
Moderator
 
Join Date: Nov 2008
Location: Rotterdam, Netherlands
Posts: 13,725
Thanks: 47
Thanked 2,023 Times in 1,862 Posts
Default

Quote:
Originally Posted by wblock@ View Post
Or bad memory causes a panic.
Wouldn't that leave traces in /var/log/messages? At the very least a crash dump in /var/crash/.

User1, also check the BIOS. There's usually a setting for when the power goes out and back on again. Most servers have the option for "off", "on" or "last state". If it's a power fluctuation and it turns off at least it should turn back on again when the power is good.
__________________
Senior UNIX Engineer at Unix Support Nederland
Experience is something you don't get until just after you need it.
Reply With Quote
  #13  
Old December 7th, 2012, 16:19
wblock@'s Avatar
wblock@ wblock@ is offline
Moderator
 
Join Date: Sep 2009
Location: Milky Way galaxy
Posts: 7,720
Thanks: 432
Thanked 1,760 Times in 1,457 Posts
Default

Quote:
Originally Posted by SirDice View Post
Wouldn't that leave traces in /var/log/messages? At the very least a crash dump in /var/crash/.
Maybe, depends on the failure mode. Seems like I've also heard of CPU cache going bad.
Reply With Quote
  #14  
Old December 7th, 2012, 16:35
gkontos's Avatar
gkontos gkontos is offline
Senior Member
 
Join Date: Dec 2009
Location: Polidendri, GR
Posts: 1,266
Thanks: 42
Thanked 218 Times in 164 Posts
Default

The /var/log/messages will give you a lot of information like:
  • If this was a clean shutdown or not.
  • The time that this occurred.

Also, during the night periodic scripts run which can stress faulty hardware.
__________________
Powered by BareBSD
Reply With Quote
  #15  
Old December 7th, 2012, 19:14
user1 user1 is offline
Junior Member
 
Join Date: Nov 2012
Posts: 17
Thanks: 5
Thanked 1 Time in 1 Post
Default

This is still an ongoing issue, the server was off this morning when we came in. The network administrator is going to check through the cron jobs but he did not seem to concerned about them I don't that much is run on that server over night.

I'll keep every one posted, hopefully we can figure out what the problem is soon.
Reply With Quote
  #16  
Old December 8th, 2012, 14:16
gkontos's Avatar
gkontos gkontos is offline
Senior Member
 
Join Date: Dec 2009
Location: Polidendri, GR
Posts: 1,266
Thanks: 42
Thanked 218 Times in 164 Posts
Default

Quote:
Originally Posted by user1 View Post
This is still an ongoing issue, the server was off this morning when we came in. The network administrator is going to check through the cron jobs but he did not seem to concerned about them I don't that much is run on that server over night.
Why do you think that a "Network Administrator" will be able to solve this problem for you?

Do you think that this is related to a network issue?

If the Network Administrator in not concerned about the periodic scripts then maybe you need to find a System Administrator.

I am being very honest and brute because your approach is really a recipe for disaster. Your topic suggests that your server which is running an non-patched and EOL Operating System is shutting down overnight without any errors in the logs.
You were asked to provide more information about this system but you can't because you obviously don't know how to. So, how can you be so sure that there are is nothing in the logs that may give you a clue on where to start looking for the problem?
__________________
Powered by BareBSD
Reply With Quote
  #17  
Old December 8th, 2012, 18:49
tingo tingo is offline
Member
 
Join Date: Nov 2008
Location: Oslo, Norway
Posts: 825
Thanks: 134
Thanked 82 Times in 68 Posts
Default

Don't forget last(1). It can also tell you why the machine was rebooted.
__________________
Torfinn
Reply With Quote
  #18  
Old December 8th, 2012, 20:49
user1 user1 is offline
Junior Member
 
Join Date: Nov 2012
Posts: 17
Thanks: 5
Thanked 1 Time in 1 Post
Default

Quote:
Originally Posted by gkontos View Post
Why do you think that a "Network Administrator" will be able to solve this problem for you?

Do you think that this is related to a network issue?

If the Network Administrator in not concerned about the periodic scripts then maybe you need to find a System Administrator.

I am being very honest and brute because your approach is really a recipe for disaster. Your topic suggests that your server which is running an non-patched and EOL Operating System is shutting down overnight without any errors in the logs.
You were asked to provide more information about this system but you can't because you obviously don't know how to. So, how can you be so sure that there are is nothing in the logs that may give you a clue on where to start looking for the problem?
Allow me to clarify the situation to avoid any confusion. Where I work there is a network/system admin who is in charge of the entire network and all the servers. He is troubleshooting the server, I am completely confident he will solve the issue but I am looking to help him solve it faster.

Also in my post further down I mentioned I do not know the status of patches and would not be able to provide valid information on whether it is patched or not.

As far as the logs I am going off what the administrator told me. I'm sure he is competent enough to search the proper logs for errors.

I am new to networking and working with servers and I am hoping some one on here (I know there are very experienced administrators on this site) would be able to give me advice to troubleshoot this problem with the limited information that is available to me.
Reply With Quote
  #19  
Old December 8th, 2012, 21:33
gkontos's Avatar
gkontos gkontos is offline
Senior Member
 
Join Date: Dec 2009
Location: Polidendri, GR
Posts: 1,266
Thanks: 42
Thanked 218 Times in 164 Posts
Default

Quote:
Originally Posted by user1 View Post
I am new to networking and working with servers and I am hoping some one on here (I know there are very experienced administrators on this site) would be able to give me advice to troubleshoot this problem with the limited information that is available to me.
It is very difficult to find people with psychic abilities in a technical forum.
__________________
Powered by BareBSD
Reply With Quote
  #20  
Old December 10th, 2012, 04:12
Terry_Kennedy's Avatar
Terry_Kennedy Terry_Kennedy is offline
Member
 
Join Date: Apr 2010
Location: New York City
Posts: 407
Thanks: 5
Thanked 73 Times in 65 Posts
Default

I'd suggest setting up a serial console and capturing that output with another PC. If the system prints something to the console and then reboots, you'll know what the problem is. If nothing is printed and the system reboots, you have a hardware problem.

Neither the built-in VGA console nor a remote viewer for the console (via server hardware management) will help, as these don't record what has scrolled off the screen. You need to capture the console output on another system.

Some system failures intentionally don't log things to the local disk (for example, if the disk drops offline there's no disk to log to), and crash dumps have been problematic for years (the mechanisms involved are not entirely SMP / thread / interrupt safe, so you often get a double panic and no useful crash data).
Reply With Quote
  #21  
Old December 10th, 2012, 19:19
phoenix's Avatar
phoenix phoenix is offline
Moderator
 
Join Date: Nov 2008
Location: Kamloops, BC, Canada
Posts: 3,141
Thanks: 43
Thanked 703 Times in 579 Posts
Default

Quote:
Originally Posted by user1 View Post
We are unsure of the exact shut down time. It happens over night and the servers are not monitored at night.
Then add some monitoring! Seriously. If you don't know when it's shutting down, then you need to add some logging to find out. Even something as simple as the following in root's crontab:
Code:
* * * * * /bin/date >> /var/log/time.log
Then you can open the file after booting, and find out when it shutdown.
__________________
Freddie

Help for FreeBSD: Handbook, FAQ, man pages, mailing lists.

Last edited by phoenix; December 10th, 2012 at 20:33. Reason: D'oh! Swap time for date.
Reply With Quote
  #22  
Old December 10th, 2012, 20:00
rolfheinrich's Avatar
rolfheinrich rolfheinrich is offline
Member
 
Join Date: Nov 2010
Location: São Paulo - Brazil
Posts: 359
Thanks: 30
Thanked 109 Times in 72 Posts
Default

Quote:
Originally Posted by phoenix View Post
...
Code:
* * * * * time >> /var/log/time.log
Then you can open the file after booting, and find out when it shutdown.
I guess you meant date(1), as time(1) may be not exactly as useful in the given respect. In addition, it is recommended to use the full path for everything in the crontab, i.e.:

Code:
*       *       *       *       *       root    /bin/date >> /var/log/time.log
Reply With Quote
The Following User Says Thank You to rolfheinrich For This Useful Post:
phoenix (December 10th, 2012)
  #23  
Old December 10th, 2012, 20:26
gkontos's Avatar
gkontos gkontos is offline
Senior Member
 
Join Date: Dec 2009
Location: Polidendri, GR
Posts: 1,266
Thanks: 42
Thanked 218 Times in 164 Posts
Default

Mercy, mercy !!!

A simple look at /var/log/messages will tell you EXACTLY when did a server rebooted!!!
__________________
Powered by BareBSD
Reply With Quote
  #24  
Old December 10th, 2012, 20:29
rolfheinrich's Avatar
rolfheinrich rolfheinrich is offline
Member
 
Join Date: Nov 2010
Location: São Paulo - Brazil
Posts: 359
Thanks: 30
Thanked 109 Times in 72 Posts
Default

Quote:
Originally Posted by gkontos View Post
Mercy, mercy !!!

A simple look at /var/log/messages will tell you EXACTLY when did a server rebooted!!!
Hmm...

This time is more or less known already, i.e. once the admin presses the power button in the morning, after finding the server being off.

Quote:
Originally Posted by user1 View Post
This is still an ongoing issue, the server was off this morning when we came in...
Reply With Quote
  #25  
Old December 10th, 2012, 21:30
gkontos's Avatar
gkontos gkontos is offline
Senior Member
 
Join Date: Dec 2009
Location: Polidendri, GR
Posts: 1,266
Thanks: 42
Thanked 218 Times in 164 Posts
Default

Even in that case /var/log/cron should give them an estimate.
__________________
Powered by BareBSD
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Solved] syslogd send logs from jail to root server graudeejs Web & Network Services 0 August 9th, 2010 09:02
Can problems occur from not shutting down correctly? prdeltoid General 3 June 17th, 2010 01:02
[Solved] System shutting off randomly Eponasoft General 17 January 24th, 2010 21:39
Problem shutting down dennylin93 General 15 April 1st, 2009 15:58
Shutting Down FreeBSD jemate18 General 11 February 5th, 2009 21:35


All times are GMT +1. The time now is 08:13.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2013, vBulletin Solutions, Inc.
The mark FreeBSD is a registered trademark of The FreeBSD Foundation and is used by The FreeBSD Project with the permission of The FreeBSD Foundation.
Web protection and acceleration provided by CloudFlare
0