Issue with NTP sync

Hi there,

I've been having a problem with NTPd synchronisation to the time server, it will sync occasionally but appears to be rejecting most of the time.

I'd been following the guide here: http://log.or.cz/?p=80 as they seem to be experiencing a very similar issue.

The trouble is, the blog author doesn't really explain his corrective action in a way which is clear to me and it appears as though he's using a Linux distro with a different command set.

Can anyone comment on his corrective steps and how they improve the situation? Plus can anyone advise how to implement similar corrective action in FreeBSD?

I've collected the output of ntpq and sysctl -a kern.timecounter below.

Many thanks

Jimmy

Code:
bash-3.2# ntpd --version
ntpd: ntpd 4.2.1p241-RC-a Tue Feb 11 23:22:18 PST 2014 (999)

bash-3.2# ntpq
ntpq> peers
  remote  refid  st t when poll reach  delay  offset  jitter
==============================================================================
172.16.2.11  .LOCL.  1 u  39  64  377  0.807  13891.0  6.230

ntpq> peers
  remote  refid  st t when poll reach  delay  offset  jitter
==============================================================================
172.16.2.11  .LOCL.  1 u  39  64  377  0.807  13891.0  6.230
ntpq> as

ind assID status  conf reach auth condition  last_event cnt
===========================================================
  1 65444  9014  yes  yes  none  reject  reachable  1
ntpq> rv 65444
assID=65444 status=9014 reach, conf, 1 event, event_reach,
srcadr=172.16.2.11, srcport=123, dstadr=172.16.2.40, dstport=123,
leap=00, stratum=1, precision=-6, rootdelay=0.000,
rootdispersion=10162.796, refid=LOCL, reach=377, unreach=0, hmode=3,
pmode=4, hpoll=6, ppoll=6, flash=400 peer_dist, keyid=0, ttl=0,
offset=13891.081, delay=0.807, dispersion=18.204, jitter=5.919,
reftime=d9f00599.2a4fca42  Fri, Nov 13 2015  6:45:45.165,
org=d9f03745.b23f67f4  Fri, Nov 13 2015 10:17:41.696,
rec=d9f03737.ce09bb38  Fri, Nov 13 2015 10:17:27.804,
xmt=d9f03737.cdcbe229  Fri, Nov 13 2015 10:17:27.803,
filtdelay=  0.94  0.81  0.84  0.84  0.89  0.87  0.83  0.97,
filtoffset= 13891.9 13891.0 13901.3 13899.5 13893.1 13885.6 13895.4 13894.7,
filtdisp=  15.63  16.57  17.55  18.51  19.47  20.43  21.42  22.36


bash-3.2# sysctl -a kern.timecounter
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(-100) i8254(0) dummy(-1000000)
kern.timecounter.hardware: i8254
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 40737
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.TSC.counter: 3704256623
kern.timecounter.tc.TSC.frequency: 2333428993
kern.timecounter.tc.TSC.quality: -100
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
bash-3.2#
 
Last edited by a moderator:
If you run an NTP service for your network let the daemon run for a while. It just takes a while for the service to get in sync and be available for clients to query.

Also make sure your current time doesn't deviate too much, if there's too much time difference it will simply refuse to sync.
 
Code:
ntpq> peers
  remote  refid  st t when poll reach  delay  offset  jitter
==============================================================================
172.16.2.11  .LOCL.  1 u  39  64  377  0.807  13891.0  6.230

ntpq> peers
  remote  refid  st t when poll reach  delay  offset  jitter
==============================================================================
172.16.2.11  .LOCL.  1 u  39  64  377  0.807  13891.0  6.230

So you are trying to synchronize to a machine with only a local clock? That is claiming it is stratum 1? NTP does its best to disqualify bogus servers and peers so that they don't impact NTP's time. Try some real clocks from, e.g., pool.ntp.org and see if that helps.
 
Hi Uniballer, yes that's right, the stratum 1 appears to be the default behaviour for a Windows Domain Controller. I can make enquiries about allowing this particular server to communicate with NTP servers. But since the clock on the Windows DC appears to be accurate is this really going to make much of a difference?
 
Cough, unless the DC has a very high precision external hardware clock it shouldn't lie about the stratum.
 
It depends on how the supposed stratum 1 server behaves. If the NTP code concludes that it is a "false ticker", or otherwise unusable, then all bets are off.
 
Hi,

Contents of the ntp.conf below.

I've been reading http://www.ntp.org/ntpfaq/NTP-s-trbl-general.htm and under Section 6 it details how to adjust the clock tick manually to increase the clock measurements. However it closes by saying that this practice should be done automatically by the system so that the system can tolerate a 500ppm error. There is a formula based on the drift value to dynamically recalculate the tick to assist synchronisation.

Also reading http://www.ntp.org/ntpfaq/NTP-s-sw-clocks-quality.htm it's pretty fascinating. It states that the quartz clock error increases by 1ppm per every degree Celsius the system rises in temperature, so under load your PPM becomes higher and therefore a system powered off will have a more accurate clock than when powered on.

It states that a standard clock has a deviancy of several PPM, and that a deviancy of 0.001% (10 PPM) is equivalent to 1s of drift/offset a day.

Code:
# Generated: Fri Nov  6 16:12:07 2015
#### Enable local clock orphan mode
tos orphan 12

#### Drift file
driftfile /var/db/ntp.drift

#### Externel Servers
# Entry for server <<OBFUSCATED>>
server 172.16.2.11 version 3

#### Internal Cluster entries

# end of file
 
Syncing the windows dc with ntp appears to have resolved it, the only discernable difference is that the Windows Server is now reporting as a stratum 3, the time difference between Windows and the external NTP pool was 6s. So I don't understand why it's working now:

Code:
ntpq> pe
  remote  refid  st t when poll reach  delay  offset  jitter
==============================================================================
*172.16.2.11  94.125.132.7  3 u  631 1024  377  0.862  8.739  7.360
ntpq> as

ind assID status  conf reach auth condition  last_event cnt
===========================================================
  1 65444  9624  yes  yes  none  sys.peer  reachable  2

Although stratum 1s are expected to be highly accurate time references, there's nothing which I can see in the documentation which states that they are required to be so.

The only requirement I can see is that the client and server must have a latency of no more than 128ms. Which on a LAN shouldn't be an issue.
 
Since discovered that a Cisco device which was not synchronising externally was also announcing itself as a stratum 1, and generating the same issue.
 
Although stratum 1s are expected to be highly accurate time references, there's nothing which I can see in the documentation which states that they are required to be so.
The problem was probably related to measurable clock stability, or the use of .LOCL. as a reference clock, rather than the claim that your DC was a stratum 1 clock. Not everything that the code does is necessarily spelled out explicitly in the documentation.

My biggest concern with the use of the DC as an externally synchronized clock source is that the reference NTP implementation has had several security problems within the last couple of years, and I don't know how you could tell if the code running on the DC is susceptible and/or has been fixed. I run a pair of BSD boxen that synchronize to different public NTP sources to provide NTP service to all of my other internal systems. That way I feel that I can easily find out about known problems and access fixes for them.
 
I had a similar problem and struggled for several hours trying to fix the issue. I tried out OpenNTPD and the problem was instantly fixed without any configuration.

That is because OpenNTPD apparently only implements the Simple Network Time Protocol, and does not detect false tickers, etc. It is probably fine for running as a client clock on a system that does not need high timestamp accuracy, but might not be so good for distributing time to a network.
 
Last edited by a moderator:
Back
Top