ntpd quickly increases poll interval to max in 3 packet exchange only

birgupta · Mar 23, 2010

Hi All,
I am having little trouble with my ntp 4.2.0 in Linux.

I have having a problem with my ntpd. I have configured only local server
127.127.1.0 as my default server and later configure any server using ntpdc
program.
Here is my configuration :

Code:

restrict default nomodify
restrict 127.0.0.1
authenticate no

server  127.127.1.0 # local clock
fudge   127.127.1.0 stratum 10

broadcastclient

Now after my ntpd syncs up with local clock, it creates drift file and store
the drift as 0.000

Now I configure a valid ntp stratum 1 server using ntpdc addserver command.
What I observe is, it starts the poll from 64 and after 3 packet exchange only
it goes to 1024. As a result it is taking lot of time to sync up. Even after
syncup I see it goes back and forth between local and the server and does clock
reset evry 30 min or so.

This situation is improved with adding burst flag while configuring the server.
Now, sometime after reach value goes to 3 or 7 , it selects the server for sync
( earlier it was not even doing that ) but again it reaches to sometime 128 or
256 or 1024. In these cases I see pstats shows that it has exchanged 9 packets.
But since the poll interval is increased, it takes longer time to reset the
clock to correct the offset and change the refid and stratum value. This is
inconsistent behavior.
If I restart the ntpd and then confiture the server ( while it is not yet
synced to local) it works fine.

Is this the right behavior or a bug. The end result is that it takes unusually
long time to sync.

Also, doing multiple reset of clock every hour seems pointing to a bug in ntpd
or probably the hardware clock is slipping continuously. Usually I would expect
after one time clock reset ( step ) the server should be doing slew. But I see
the offset reported even after first or second reset is more than 250 mSec.
Below is some captures:
with burst setup:

Code:

* - selected for sync, + -  peer mode(active), 
- - peer mode(passive), = - polled in client mode 
    remote          local        st  poll  reach   delay    offset     disp
---------------------------------------------------------------------------
=127.127.1.0     127.0.0.1       10   64  377   0.00000  0.000000 0.03078
*10.81.254.131   10.194.60.72     1  128    3   0.07545 -0.063021 0.01147

ntpdc> peers
     remote           local      st poll reach  delay   offset    disp
=======================================================================
=LOCAL(0)        127.0.0.1       10   64  377 0.00000  0.000000 0.03036
*10.81.254.131   10.194.60.72     1 1024    3 0.07544 -0.280085 0.00662
ntpdc> pstats 10.81.254.131
remote host:          10.81.254.131
local interface:      10.194.60.72
time last received:   278s
time until next send: 732s
reachability change:  356s
packets sent:         9
packets received:     9
bad authentication:   0
bogus origin:         0
duplicate:            0
bad dispersion:       0
bad reference time:   0
candidate order:      6

Please let me know if any one has any clue.

birgupta · Mar 23, 2010

NTP reset clock using step , the peers poll is not reset to min_poll

I have the following configuration:

Code:

restrict default nomodify
restrict 127.0.0.1
authenticate no

server  127.127.1.0 # local clock
fudge   127.127.1.0 stratum 10

broadcastclient

I am using ntpdc to add servers on the fly after ntpd is started. As a result,
if I add a server after a while when local clock is synchronized and drift file
is created ( recording 0.000 most of the time ), When I confiure a new server,
after 2 poll ( min 64 sec ) the poll interval ( as seen in ntpq or ntpdc )
reaches to 1024 sec ( max). If I use burst flag, it sends 9 packets within 2
poll interval( 64*2 ), so it selects the server ( indicated by * flag ), but it
takes very long time to actally record frequency and correct the offset.
But after correcting the offset by stepping up the local clock, everything
becomes 0 but poll interval remains same at 1024. As a result, next time again
local clock is selected and the first packet for the remote server goes only
after 1024 sec. So effectively it takes again 2048 sec before the remote server
is selected. According to the clear procedure documented in RFC 1305, it should
clear states of all peer and set the host poll to min poll so that all servers
can be restarted selection state machine a fresh. I see the code as well, it
does set the hpoll = min_poll, but sets ppoll = max_poll. Not sure why ? Here
is the code sippet :

Code:

/*
 * peer_clear - clear peer filter registers.  See Section 3.4.8 of the spec.
 */
void
peer_clear( 
    struct peer *peer,      /* peer structure */
    char    *ident          /* tally lights */
    )
{
    int i;

    peer_crypto_clear(peer);

    if (peer == sys_peer)
        sys_peer = NULL;

    /*
     * Wipe the association clean and initialize the nonzero values.
     */
    memset(CLEAR_TO_ZERO(peer), 0, LEN_CLEAR_TO_ZERO);
    peer->estbdelay = sys_bdelay;
    peer->ppoll = peer->maxpoll;
    peer->hpoll = peer->minpoll;
    peer->disp = MAXDISPERSE;
    peer->jitter = LOGTOD(sys_precision); 
    for (i = 0; i < NTP_SHIFT; i++) {
        peer->filter_order[i] = i; 
        peer->filter_disp[i] = MAXDISPERSE;
    }

According to this code, it should reset the poll to min_poll, but it does not.
Here is the capture from my test:

Code:

ntpdc> pstats 10.81.254.131
remote host:          10.81.254.131
local interface:      10.194.60.72
time last received:   995s
time until next send: 14s
reachability change:  2036s
packets sent:         26 <<<<< Note here it is 26 packets
packets received:     26
bad authentication:   0
bogus origin:         0
duplicate:            0
bad dispersion:       0
bad reference time:   0
candidate order:      6

ntpdc> peers
     remote           local      st poll reach  delay   offset    disp
=======================================================================
=LOCAL(0)        127.0.0.1       10   64  377 0.00000  0.000000 0.03078
*10.81.254.131   10.194.60.72     1 1024    3 0.07549 -1.026857 0.01190
ntpdc> showpeer 10.81.254.131
remote 10.81.254.131, local 10.194.60.72
hmode client, pmode mode#255, stratum 1, precision -20
leap 00, refid [GPS], rootdistance 0.00000, rootdispersion 0.00027
ppoll 10, hpoll 10, keyid 0, version 4, association 28789
reach 003, unreach 0, flash 0x0000, boffset 0.00400, ttl/mode 0
timer 92s, flags system_peer, config, bclient, burst
reference time:      cf519d1a.bd707ea8  Mon, Mar 22 2010  7:39:06.739
originate timestamp: cf519d1e.00ae70a3  Mon, Mar 22 2010  7:39:10.002
receive timestamp:   cf519d1f.12c39799  Mon, Mar 22 2010  7:39:11.073
transmit timestamp:  cf519d1e.ff66d7a5  Mon, Mar 22 2010  7:39:10.997
filter delay:  0.07555  0.07558  0.07553  0.07558 
               0.07553  0.07549  0.07555  0.07985 
filter offset: -1.03285 -1.03186 -1.03082 -1.02985
               -1.02785 -1.02685 -1.02584 -1.02272
filter order:  5        2        4        6

Capture after the next packet is sent and clock is reset using step:

Code:

ntpdc> pstats 10.81.254.131
remote host:          10.81.254.131
local interface:      10.194.60.72
time last received:   16s
time until next send: 1011s
reachability change:  2080s
packets sent:         34 <<< Here on next poll it sent 8 instead of 1, because
of burst configuration.So, it resets the clock in the next poll itself.
packets received:     34
bad authentication:   0
bogus origin:         0
duplicate:            0
bad dispersion:       0
bad reference time:   0
candidate order:      0
ntpdc> peers
     remote           local      st poll reach  delay   offset    disp
=======================================================================
=LOCAL(0)        127.0.0.1       10   64    0 0.00000  0.000000 0.00000
=10.81.254.131   10.194.60.72    16 1024    0 0.00000  0.000000 0.00000
ntpdc> showpeer 10.81.254.131
remote 10.81.254.131, local 10.194.60.72
hmode client, pmode unspec, stratum 16, precision -20
leap 11, refid [83.84.69.80], rootdistance 0.00000, rootdispersion 0.00000
ppoll 10, hpoll 10, keyid 0, version 4, association 28789
reach 000, unreach 0, flash 0x0000, boffset 0.00400, ttl/mode 0
timer 92s, flags config, bclient, burst
reference time:      cf51a119.93d6e22b  Mon, Mar 22 2010  7:56:09.577
originate timestamp: 00000000.00000000  Thu, Feb  7 2036  6:28:16.000
receive timestamp:   00000000.00000000  Thu, Feb  7 2036  6:28:16.000
transmit timestamp:  00000000.00000000  Thu, Feb  7 2036  6:28:16.000
filter delay:  0.00000  0.00000  0.00000  0.00000 
               0.00000  0.00000  0.00000  0.00000 
filter offset: 0.000000 0.000000 0.000000 0.000000
               0.000000 0.000000 0.000000 0.000000
filter order:  7        6        5        4       
               3        2        1        0       
offset 0.000000, delay 0.00000, error bound 0.00000, filter error 2.0000

Now in this condition, it will wait for another 1024 sec again before selecting
the remote server.

This keeps happening at least couple of times before offset comes down below
128 msec where the ntp will slew the clock.
This makes to feel that the clock is not stable and it takes very long time to
finally stablize where sysinfo would show refid as the server's IP address.

Let me know if this is a bug and should I apply some patch.Comment 1 birendra 2010-03-22 01:25:27 PDT
Also, I want to understand, why it needs to do multiple reset of clock. After
first reset of the clock, local clock is now sync to the remote, so why it is
finding again so much offset ? Is it pointing to some local hardclock clock
problem ?
See the below log:

Code:

root@Node1:~# tail -f /var/log/messages | grep ntpd
2010-03-22T07:04:45+00:00 Node1 ntpd[4706]: time reset +17.014753 s
2010-03-22T07:09:01+00:00 Node1 ntpd[4706]: synchronized to LOCAL(0), stratum
10
2010-03-22T07:39:02+00:00 Node1 ntpd[4706]: synchronized to 10.81.254.131,
stratum 1
2010-03-22T07:56:14+00:00 Node1 ntpd[4706]: time reset -1.545074 s
2010-03-22T08:00:36+00:00 Node1 ntpd[4706]: synchronized to LOCAL(0), stratum
10

and currently the offset is again more than 500 mesc:

Code:

ntpdc> peers
     remote           local      st poll reach  delay   offset    disp
=======================================================================
*LOCAL(0)        127.0.0.1       10   64  377 0.00000  0.000000 0.03041
=10.81.254.131   10.194.60.72     1 1024    1 0.07962  0.512339 2.81735

This meens, it is going to step once again after 25-30 min. The question is why
it is finding so much of offset again and again. 
I don't think there is any problem in the network. the server 10.81.254.131 in
an intranet server. The ping performance for this server from my host is
following:
root@Node1:~# ping 10.81.254.131
PING 10.81.254.131 (10.81.254.131) 56(84) bytes of data.
64 bytes from 10.81.254.131: icmp_seq=1 ttl=54 time=79.6 ms
64 bytes from 10.81.254.131: icmp_seq=2 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=3 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=4 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=5 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=6 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=7 ttl=54 time=75.5 ms
64 bytes from 10.81.254.131: icmp_seq=8 ttl=54 time=75.5 ms
64 bytes from 10.81.254.131: icmp_seq=9 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=10 ttl=54 time=75.3 ms
64 bytes from 10.81.254.131: icmp_seq=11 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=12 ttl=54 time=75.4 ms
64 bytes from 10.81.254.131: icmp_seq=13 ttl=54 time=75.5 ms

Any comments on this is highly appreciated.

gilinko · Mar 23, 2010

Some hint's.

* Don't double post
* Use the code tags
* As per you other thread, FreeBSD != Linux so you need to ask the questions for your ntpd in your distributions forum/mailinglist/irc etc.

DutchDaemon · Mar 23, 2010

birgupta said:
Hi All,
I am having little trouble with my ntp 4.2.0 in Linux.

Why post this on a FreeBSD forum? We'll happily patch the Linux code .. widely

ntpd quickly increases poll interval to max in 3 packet exchange only

birgupta

birgupta

gilinko

DutchDaemon

Administrator