Solved Sporadic automatic reboots - "rebooted by root"

On our router, I see sporadic automatic reboots for no apparent reason.

Code:
root@xxxx:~ # grep rebooted /var/log/messages
Feb 17 16:28:48 is129 reboot[21416]: rebooted by root
Feb 17 17:10:45 is129 reboot[2192]: rebooted by root
Feb 17 17:16:45 is129 reboot[1974]: rebooted by root
Feb 17 17:22:44 is129 reboot[1973]: rebooted by root
Feb 17 17:32:44 is129 reboot[1988]: rebooted by root
Feb 17 17:38:45 is129 reboot[1966]: rebooted by root
Feb 22 12:29:36 is129 reboot[43433]: rebooted by root
Mar 18 11:55:39 is129 reboot[86980]: rebooted by root
Mar 18 12:12:17 is129 reboot[2038]: rebooted by root
Mar 18 12:18:18 is129 reboot[1961]: rebooted by root
Mar 18 12:46:18 is129 reboot[2067]: rebooted by root
Mar 18 12:54:18 is129 reboot[1963]: rebooted by root

It never came to my awareness before, just now, as it happened during office hours.

I cannot see any reason for this behaviour. No cron job, no problem in the message file, no notification per mail, nothing.

What might be the reason for this?
 
Wat are the messages leading up to the reboot message? Who's logged in at that time? Check last(1).
 
Seems to be a good idea to replace reboot by a script. But maybe someone has an idea, why it may happen in the first case.

This is a router and there are no users beyond root.

Normally, just a login directly on the console will be active.
Code:
root       pts/0    aa.bb.cc.dd            Fri Mar 18 14:02   still logged in
root       ttyv0                           Fri Mar 18 13:38   still logged in
boot time                                  Fri Mar 18 12:55
shutdown time                              Fri Mar 18 12:54
boot time                                  Fri Mar 18 12:48
shutdown time                              Fri Mar 18 12:46
boot time                                  Fri Mar 18 12:20
shutdown time                              Fri Mar 18 12:18
boot time                                  Fri Mar 18 12:13
shutdown time                              Fri Mar 18 12:12
boot time                                  Fri Mar 18 11:55
shutdown time                              Fri Mar 18 11:55
root       pts/0    aa.bb.cc.dd            Thu Mar 17 10:24 - 11:29  (01:04)
root       pts/0    aa.bb.cc.dd            Tue Mar 15 04:29 - 06:51  (02:21)
 
I think based on the information provided, it's very hard to make an educated guess as to why.
What is in /var/log/messages right before one of the rebooted messages? There may be a clue there.
Is there any other software running like IDS, filtering that is looking at packets and maybe causing the reboot?
Is it connected to a UPS that may be causing the reboot?

Have you tried changing the root password to see if the problem goes away? If it does, that would imply a human being doing it.
 
Code:
Mar 11 13:39:33 is129 ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 74 days ago
Mar 12 13:39:33 is129 ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 75 days ago
Mar 13 13:39:33 isxxx ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 76 days ago
Mar 14 13:39:34 isxxx ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 77 days ago
Mar 15 03:39:45 isxxx kernel: arp: aa.bb.cc.dd moved from ... to .... on igb0
Mar 15 13:39:34 isxxx ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 78 days ago
Mar 16 13:39:34 isxxx ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 79 days ago
Mar 17 10:29:06 isxxx root[78832]: /etc/rc.d/ipfilter: ERROR: Load of rules into alternate set failed; aborting reload
Mar 17 13:39:35 isxxx ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 80 days ago
Mar 17 19:04:35 isxxx ipmon[1693]: 19:04:35.715498 igb1 @0:15 b 45.95.55.55,53 -> aa.bb.cc.dd,40330 PR tcp len 20 40 -FUPE IN bad
Mar 18 11:55:39 isxxx reboot[86980]: rebooted by root
Mar 18 11:55:39 isxxx syslogd: exiting on signal 15
Mar 18 11:55:02 isxxx syslogd: kernel boot file is /boot/kernel/kernel
...

So, not a single reason visible. I use ipfilter for managing the access to our net. Not connected to a UPS currently.

The access to the host is by ssh keys only. And no, there are no "disgruntled employees" near or far.

Ok, I think I will try logging some context information.

BTW., Feb 15 was probably I myself testing something. I deleted them from the initial post.

Maybe it's the hardware that is shutting the computer down? Some temperature problem? I don't think so, but it just came to my mind that it hasn't to be the software for sure.
 
SSh keys access is good; if there are multiple people allowed, just double check that all the keys have not been compromised.
Temperature? Perhaps but I think it would shutdown, not reboot. If you look at the deltas on 18 March, kind of random.
But a quick temp check, assuming this router is actually running FreeBSD:
sysctl -a | grep temperature

You may need to load coretemp.ko with kldload as root.
A quick little script I've had around for years that dumps the values of the sysctls:
Code:
#!/bin/sh

numcores=`sysctl -n hw.ncpu`
cores=0

until [ $cores -eq $numcores ]
do
    echo Core \#$cores: `sysctl dev.cpu.$cores.temperature`
    cores=`expr $cores + 1`
done
 
Temperature? Perhaps but I think it would shutdown, not reboot.
It would do a very hard power down if that was the case. As in, immidiate power off, not even logging anything. High temperature cut off is done by the hardware, not the OS.

Code:
Feb 17 16:28:48 is129 reboot[21416]: rebooted by root
Feb 17 17:10:45 is129 reboot[2192]: rebooted by root
Feb 17 17:16:45 is129 reboot[1974]: rebooted by root
Feb 17 17:22:44 is129 reboot[1973]: rebooted by root
Feb 17 17:32:44 is129 reboot[1988]: rebooted by root
Feb 17 17:38:45 is129 reboot[1966]: rebooted by root
Feb 22 12:29:36 is129 reboot[43433]: rebooted by root
Mar 18 11:55:39 is129 reboot[86980]: rebooted by root
Mar 18 12:12:17 is129 reboot[2038]: rebooted by root
Mar 18 12:18:18 is129 reboot[1961]: rebooted by root
Mar 18 12:46:18 is129 reboot[2067]: rebooted by root
Mar 18 12:54:18 is129 reboot[1963]: rebooted by root
Most of them seem to happen 6 minutes apart, that does suggest a cronjob or something similar. Have you checked /var/log/cron?
 
is129 ntpd[2677]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): expired 74 days ago
You should fix this. Disable leapseconds or fix this error.
Noisy log make critical single lines harder to spot.

kernel: arp: aa.bb.cc.dd moved from ... to .... on igb0
Like this. Why was stuff moved. Arp is at play.

Have you studied service -e Strip the machine to bare essentials.
Code:
### Un-needed firewall services ###
cron_enable="NO"
virecover_enable="NO"
mixer_enable="NO"
moused_ums0_enable="NO"
moused_ums1_enable="NO"
ip6addrctl_enable="NO"
ipv6_activate_all_interfaces="NO"
update_motd="NO"
savecore_enable="NO"
sendmail_enable="NO"
sendmail_submit_enable="NO"
sendmail_msp_queue_enable="NO"
sendmail_outbound_enable="NO"
resolv_enable="NO"


What about syslog and rotation?
 
Mar 17 10:29:06 isxxx root[78832]: /etc/rc.d/ipfilter: ERROR: Load of rules into alternate set failed; aborting reload Mar 17 13:39:35 isxxx
Firewall issues too. Whats up with that?

This coinsides with IPMON. Maybe you should slim back some to troubleshoot this.
What does ipfilter.log show? Everything OK there?
IPMON can fill up some logs fast and is the reason I asked if you checked diskspace.
 
Plenty of space available, 6 GB used out of 200 GB.

kernel: arp ...: I had to move the mail server to a different machine.

Firewall issue: I'm running ipfilter and there was a typo when changing the config file. So it couldn't load successfully.

ipmon / ipfilter: well, ipmon -Ds is running, but nothing is logged to syslog. Don't quite remember, what I used it for, probably wanted to log some ipfilter packets, but they appear in /var/log/messages and not in syslog. So I probably don't need it anyway.

ipfilter itself works fine without problems.

leapseconds: yes, have to have a look at it.

service -e looks fine to me, but when I see your resolv_enable="NO" and others from you, there seems to be some potential for cleanup.

Code:
Mar 18 16:58:24 is129 root[3352]: /usr/sbin/service: WARNING: $ is not set properly - see rc.conf(5).
Mar 18 16:58:24 is129 root[3493]: /usr/sbin/service: WARNING: $tpmd_enable is not set properly - see rc.conf(5).
Mar 18 16:58:24 is129 root[3498]: /usr/sbin/service: WARNING: $tcsd_enable is not set properly - see rc.conf(5).
Mar 18 16:58:24 is129 root[3631]: /usr/sbin/service: WARNING: $dbus_enable is not set properly - see rc.conf(5).
Mar 18 16:58:24 is129 root[3636]: /usr/sbin/service: WARNING: $avahi_daemon_enable is not set properly - see rc.conf(5).
Mar 18 16:58:24 is129 root[3641]: /usr/sbin/service: WARNING: $cupsd_enable is not set properly - see rc.conf(5).
Mar 18 16:58:24 is129 root[3646]: /usr/sbin/service: WARNING: $avahi_dnsconfd_enable is not set properly - see rc.conf(5).

$ is not set properly - how to set THAT ONE properly? A quick look into the man page didn't reveal anything. $="NO"?

syslog / rotation: yes, works fine. I rotate /var/log/messages many more times, as there was some other problem earlier, maybe the problem with the realtek network cards.

And no: nothing wireless and nothing in /var/log/cron.
 
$ is not set properly - how to set THAT ONE properly? A quick look into the man page didn't reveal anything. $="NO"?
Check /etc/rc.conf, you may have a missing quote somewhere or some other syntax issue. Also check the files in /etc/rc.conf.d/ as there could be configuration files kept in there nowadays.
 
Check /etc/rc.conf, you may have a missing quote somewhere or some other syntax issue. Also check the files in /etc/rc.conf.d/ as there could be configuration files kept in there nowadays.
/etc/rc.conf is fine.

When emptying /etc/rc.conf completely (0 bytes), the error message remains. But when deleting /etc/rc.conf, the error message disappears.

Moving all other files out of view, which service -e opens, does not remove that error.

So others "should" have this error message as well.

Code:
root@isxxx:/etc # freebsd-version -ku
12.2-RELEASE-p6
12.2-RELEASE-p6
root@isxxx:/etc #
(Time for an update, I see...)
 
Am I missing something, but have you checked /var/log & particularly debug or messages?
If there's nothing there, then run sysrc dumpdev=auto but ensure there's enough swap. You could also run dumpon of course.
If there is still no information from the kernel then this is hardware, such as RAM, GPU or CPU.
 
This is really weird, try unplugging the keyboard from the box to eliminate any chances of a malfunction sending Ctrl+Alt+Del

I know how it sounds
 
Ah, interesting idea. Keyboard is connected via KVM, so maybe the KVM did something unusual.
I changed the sysctl variables.

dumpdev was on auto already. And yes, I checked "everything" and didn't see anything. So yes, probably hardware.
 
Is it an option to change
Code:
*.err;kern.warning;auth.notice;mail.crit                /dev/console
in the file /etc/syslog.conf to point to a log file, instead of the console? I guess you are not reading the console on this router.
 
So, the problem reoccured after about five weeks of problem free operation.

Again, about every 6 minutes, a reboot occured - now for almost two days (holidays).

So I checked what's going on (replaced reboot), and I saw the following process tree:

Code:
user   pid  ppid
root  2394    1  0:00.00 /usr/sbin/sshd
root  2399 2394  0:00.01 sshd: root@notty (sshd)
root  2401 2399  0:00.01 csh -c /sbin/reboot
root  2403 2401  0:00.00 /bin/sh /sbin/reboot
root  2404 2403  0:00.00 /bin/sh /sbin/reboot

Reading from bottom to the top, reboot should occur with pid 2404, parent id 2043, up to parent id 2394, which is the sshd?

As I'm not on the console, I'm active via sshd, so sshd is necessary, but why should sshd start a reboot regularly? Looking at last, no one is logged into the router beyond myself.

Can there be an ssh configuration, which may reboot the machine due to some configuration?
 
Back
Top