Solved Clock rapidly getting out of sync

NewGuy · Nov 2, 2015

This past week I have noticed the clocks on my FreeBSD servers have been getting rapidly out of sync. Up to about 24 hours off over the course of a few days. It's not happening with all my FreeBSD servers, about half of them. The rest are sticking pretty close to the normal time, which is weird because they are all running identical configurations.

I found today that if I restart the openntpd daemon, the misaligned clocks temporarily sync up, but then quickly fall out of line again within half an hour. (The misbehaving clocks fell behind by about 3 minutes within half an hour.)

Any thoughts on how I can address this issue, apart from restarting the openntpd daemon in a cron job?

rotor · Nov 3, 2015

When you restart openntp, does ntpctl -s all show it sync'ing after a few minutes? Or does the daemon never sync?

UnixRocks · Nov 3, 2015

Are these virtual machines or physical hardware?

NewGuy · Nov 3, 2015

The output I get from ntpctl -s all is

Code:

4/4 peers valid, clock unsynced, clock offset is 6174666.987ms

peer
  wt tl st  next  poll  offset  delay  jitter
37.59.60.67 from pool pool.ntp.org
  1 10  2  13s  34s  25251.397ms  8.308ms  0.071ms
109.69.184.210 from pool pool.ntp.org
  1 10  2  8s  33s  25430.392ms  15.021ms  0.215ms
5.196.160.139 from pool pool.ntp.org
  1 10  2  24s  33s  32980.172ms  8.432ms  0.288ms
178.32.130.232 from pool pool.ntp.org
  1 10  2  31s  32s  34014.381ms  11.447ms  0.479ms

I tried running the command a few times, over a couple of minutes and the clock offset value was getting slightly smaller over time one one server, while the offset was growing on another. So it looks like the clock is trying to sync, but one is succeeding while the other is failing and getting increasing out of sync.

Edit: to answer UnixRocks, these are all virtual machines. What I find odd is they've all been on time for months now and it's only been this past week some of them have fallen out of sync. Others are still on the right time and have the same configuration.

rotor · Nov 3, 2015

Being virtual machines, you may be affected by forces beyond your control (good catch, UnixRocks). Having said that, give this a try.

stop the openntpd daemon
run ntpdate -sv pool.ntp.org to get the system clock close to correct time
start the openntpd daemon

Then watch ntpctl -sa again to see if the clock wanders off again. You may also want to take a peek at /var/log/messages to see the correction ntpdate applied.

Also, make sure you don't have another ntp daemon running.

NewGuy · Nov 3, 2015

Thanks, rotor. I did confirm no other sync daemons are running and I will try your suggestions and see what happens.

kpa · Nov 3, 2015

Take a look at sysctl kern.eventtimer outputs of each system and see if there's differences on what is selected as kern.eventtimer.timer.

UnixRocks · Nov 3, 2015

As rotor said, this may be beyond your control as these are virtual machines. We have had problems with clocks losing sync in our VMware environment when the host is heavily loaded for some reason. If you are running VMware see if you can get the VMware Tools installed and set up the time synchronization via VMware. If you are in some other virtualized environment you will need to see what that environment provides.

NewGuy · Nov 3, 2015

kpa, I checked and the timer is the same across all the machines.

Following rotor's advice, I stopped the daemon and did a sync using ntpdate -sv pool.ntp.org. Then restarted the daemon. What happened afterward was interesting. ntpctl(8) showed the clock started off synced, but after about a minute there was a 12 second gap. For a few seconds the gap got slowly smaller, as though the daemon was trying to sync, but then the gap jumped to 18 seconds. Once again, the gap got smaller for a while, then jumped up again and so on.

So it looks like the daemon is working properly, but something is causing the clock to fall out of sync faster than the daemon can keep up. but only on some of the machines.

The host for these virtual machines is KVM so I will look into possible solutions. Thanks, UnixRocks, for pointing me in that direction. It seems as though this is a VM issue, not a FreeBSD issue. Thanks, everyone, for helping me trouble-shoot this.

NewGuy · Nov 5, 2015

Update: I talked with the VPS host provider. I think they made some adjustments on the host and got me to reboot the instances that were falling out of sync. The problem is now resolved, though I'm not sure what caused it in the first place.

Gašper Žnidaršič Bečaj · Dec 9, 2015

I have very similar problem. But I am using PCBSD and my machine is not virtual machine, but desktop computer.

From /var/log/message log:

Code:

Dec  7 14:24:58 plex ntpd[11502]: 2 out of 4 peers valid
Dec  7 14:24:58 plex ntpd[11502]: bad peer 0.si.pool.ntp.org (89.212.75.6)
Dec  7 14:24:58 plex ntpd[11502]: bad peer 1.si.pool.ntp.org (84.255.235.43)

Could that be the reason. And how can I solve this?

NewGuy · Dec 9, 2015

What happens when you run ntpctl -s all?
Also, does the clock go to the proper time if you stop and restart the network time service?

Gašper Žnidaršič Bečaj · Dec 10, 2015

Clock goes to proper time if I restart. And this is output of command (I didn't restarted yet this time for debugging):

Code:

4/4 peers valid, clock unsynced, clock offset is 175727691.937ms

peer
   wt tl st  next  poll          offset       delay      jitter
193.2.4.2 0.si.pool.ntp.org
    1 10  2   30s   32s     -1228.867ms     6.243ms     0.396ms
193.2.78.2 1.si.pool.ntp.org
    1 10  2    6s   34s     -1226.833ms     7.081ms     0.405ms
93.103.22.152 2.si.pool.ntp.org
    1 10  2    4s   33s     -1630.145ms     6.348ms     4.675ms
109.127.214.126 3.si.pool.ntp.org
    1 10  2   16s   34s     -1224.217ms     6.615ms     0.199ms