AMD system timer issues

iongchun · Jan 20, 2009

Hi friends!

I just have a new system setup for my old FreeBSD installation,
version 6.4, the old hard disks are installed in the new system.
I find that it has serious timer issues.

When it boots with kernel without SMP (because the old system is
UP), the timer is very slow, system response is poor, the
interrupt rate of clock is about 6xx. Then I build a new kernel
with SMP option and run with it, the timer seems correct,
the interrupt rates of clock for each CPU rise to 19xx. But
when I run ntpd, it always keeps at stratum 16, with precision at
-19. I run openntpd instead, and it always keeps out-of-sync with
1 to 2 seconds.

Does anyone has similar problem with me?

My system:
M/B: Asus M3A78-EM
CPU: AMD Athlon 5050e (EFamily: 0 EModel: 6 Family: 15 Model: 107 Stepping: 2)
NIC(on-board): re0 (needs Pyun YongHyeon
's patched driver to work)
NIC(PCI): rl0
HDD: two Seagate IDE disks

dmesg.boot is attached.

Thanks,
iongchun

ale · Jan 20, 2009

I don't know if it's related to your problem, but did you noticed that?

Code:

ad0: DMA limited to UDMA33, device found non-ATA66 cable

This can impact badly on performance.

iongchun · Jan 20, 2009

Yes, but..

ad0: DMA limited to UDMA33, device found non-ATA66 cable
ad0: 114473MB <Seagate ST3120814A 3.AAD> at ata0-master UDMA33
ad1: 152627MB <Seagate ST3160812A 3.AAJ> at ata0-slave UDMA100

I don't know why two disks on the same cable report different modes

And I wonder why the MB vendor supply this cable with their MB

ale · Jan 20, 2009

Some ideas to try...
Depending on your bios, check if you can detect the hds and turn off autodetection.
Check the jumpers on the hds (if present) and use cable-select logic (not forcing master/slave)
If possible, try swapping the positions of hds on the cable and look at dmesg output.
Try a new cable.
Check the pins on the hds/mb.

iongchun · Jan 21, 2009

I set the jumpers to cable-select on both disks, and use auto-detection in BIOS, it seems both are detected by BIOS correctly.
I will try another cable, swapping the positions, and check the pins later.
Thanks.

iongchun · Feb 7, 2009

Hi,

just back from holidays..
I replaced the cable, now DMA modes on both disks are detected correctly:
ad0: 114473MB <Seagate ST3120814A 3.AAD> at ata0-master UDMA100
ad1: 152627MB <Seagate ST3160812A 3.AAJ> at ata0-slave UDMA100

But the ntpd still doesn't work well:
iongchun@angada$ ntpdc -c sysinfo
system peer: 0.0.0.0
system peer mode: unspec
leap indicator: 11
stratum: 16
precision: -19
root distance: 0.00000 s
root dispersion: 0.00021 s
reference ID: [73.78.73.84]
reference time: 00000000.00000000 Thu, Feb 7 2036 14:28:16.000
system flags: auth monitor ntp kernel stats
jitter: 0.000000 s
stability: 0.000 ppm
broadcastdelay: 0.003998 s
authdelay: 0.000001 s

iongchun@angada$ sysctl kern.timecounter
kern.timecounter.stepwarnings: 0
kern.timecounter.nbinuptime: 212392257
kern.timecounter.nnanouptime: 238629
kern.timecounter.nmicrouptime: 299043
kern.timecounter.nbintime: 3837362426
kern.timecounter.nnanotime: 2025617900
kern.timecounter.nmicrotime: 1812551183
kern.timecounter.ngetbinuptime: 3503127
kern.timecounter.ngetnanouptime: 3586205
kern.timecounter.ngetmicrouptime: 25624451
kern.timecounter.ngetbintime: 0
kern.timecounter.ngetnanotime: 0
kern.timecounter.ngetmicrotime: 22042146
kern.timecounter.nsetclock: 4
kern.timecounter.hardware: HPET
kern.timecounter.choice: TSC(-100) HPET(900) ACPI-safe(850) i8254(0) dummy(-1000000)
kern.timecounter.tick: 1
kern.timecounter.smp_tsc: 0

trev · Feb 7, 2009

system peer: centurion.xxxxxxxxxxxxx.xxx
system peer mode: client
leap indicator: 00
stratum: 3
precision: -19
root distance: 0.04567 s
root dispersion: 0.10408 s
reference ID: [192.168.1.2]
reference time: cd37e57b.18b4ac2c Sat, Feb 7 2009 21:48:27.096
system flags: auth monitor ntp kernel stats
jitter: 0.003342 s
stability: 0.000 ppm
broadcastdelay: 0.003998 s
authdelay: 0.000000 s

I connect to my ISP's stratum 3 time server.

As for the precision, here's what RFC1305 (spec for NTP v3) says:

Code:

Precision (sys.precision, peer.precision, pkt.precision):
This is a signed integer indicating the precision of the various
clocks, in seconds to the nearest power of two. The value must 
be rounded to the next larger power of two; for instance, a 50-Hz
(20 ms) or 60-Hz (16.67 ms) power-frequency clock would be 
assigned the value -5 (31.25 ms), while a 1000-Hz (1 ms) 
crystal-controlled clock would be assigned the value -9 (1.95 ms).

so I don't think -19 is anything to worry about

What I would worry about is the fact that your leap year indicator is showing:

Code:

11, alarm condition (clock not synchronized)

iongchun · Feb 10, 2009

Hmm.. and in my ntp.log, it shows "kernel time sync status 2040".

iongchun@angada$ sudo /etc/rc.d/ntpd stop
Stopping ntpd.
iongchun@angada$ sudo /etc/rc.d/ntpdate start
Setting date via ntp.
10 Feb 09:12:01 ntpdate[96476]: step time server 220.130.158.52 offset 4.361164 sec
iongchun@angada$ sudo /etc/rc.d/ntpd start
Starting ntpd.

In my ntp.log:
10 Feb 09:12:09 ntpd[96491]: logging to file /var/log/ntp.log
10 Feb 09:12:09 ntpd[96491]: precision = 1.955 usec
10 Feb 09:12:09 ntpd[96491]: Listening on interface #0 wildcard, 0.0.0.0#123 Disabled
10 Feb 09:12:09 ntpd[96491]: Listening on interface #1 wildcard, ::#123 Disabled
...
...
10 Feb 09:12:09 ntpd[96491]: Listening on routing socket on fd #32 for interface updates
10 Feb 09:12:09 ntpd[96491]: kernel time sync status 2040
10 Feb 09:12:09 ntpd[96491]: frequency initialized 0.000 PPM from /var/db/ntpd.d

iongchun@angada$ ntpq -n -c peer
remote refid st t when poll reach delay offset jitter
==============================================================================
220.130.158.51 129.6.15.29 2 u 3 64 1 49.464 25.325 0.002
220.130.158.71 129.6.15.29 2 u 2 64 1 49.931 34.098 0.002
220.130.158.52 129.6.15.29 2 u 1 64 1 50.298 37.447 0.002
220.130.158.72 .INIT. 16 u - 64 0 0.000 0.000 0.002
220.130.158.54 .INIT. 16 u - 64 0 0.000 0.000 0.002

Then several minutes later:
yongjhen@angada$ ntpq -n -c peer
remote refid st t when poll reach delay offset jitter
==============================================================================
220.130.158.51 129.6.15.29 2 u 57 64 77 49.426 854.455 549.354
220.130.158.71 129.6.15.29 2 u 56 64 77 50.895 864.615 549.461
220.130.158.52 129.6.15.29 2 u 59 64 73 50.487 858.402 587.185
220.130.158.72 129.6.15.29 2 u 58 64 77 51.680 860.632 544.609
220.130.158.54 129.6.15.29 2 u 58 64 77 49.760 860.098 541.103

trev · Feb 10, 2009

iongchun said:
Hmm.. and in my ntp.log, it shows "kernel time sync status 2040".

iongchun@angada$ sudo /etc/rc.d/ntpd stop
Stopping ntpd.
iongchun@angada$ sudo /etc/rc.d/ntpdate start
Setting date via ntp.
10 Feb 09:12:01 ntpdate[96476]: step time server 220.130.158.52 offset 4.361164 sec

To be 4.36+ seconds out with ntpd running would tend to confirm that your clock is not being sync'd by ntpd or if it is, then not frequently enough.

iongchun · Feb 16, 2009

I tried running OpenNTPD in debug mode:

iongchun@angada$ sudo /usr/local/sbin/ntpd -s -d
listening on ::
ntp engine ready
reply from 220.130.158.51: offset 1.413046 delay 0.054696, next query 5s
reply from 220.130.158.51: offset 1.432007 delay 0.073732, next query 9s
reply from 220.130.158.51: offset 1.439859 delay 0.049750, next query 6s
peer 220.130.158.51 now valid
reply from 220.130.158.51: offset 1.454042 delay 0.056760, next query 8s
reply from 220.130.158.51: offset 1.467608 delay 0.053761, next query 6s
reply from 220.130.158.51: offset 1.479876 delay 0.053769, next query 9s
reply from 220.130.158.51: offset 1.498829 delay 0.049478, next query 30s
reply from 220.130.158.51: offset 1.560007 delay 0.050969, next query 34s
adjusting local clock by 1.498829s
reply from 220.130.158.51: offset 1.479896 delay 0.051335, next query 31s
reply from 220.130.158.51: offset 1.417985 delay 0.073071, next query 32s

It seems adjtime(2) has no (or little) effect on my system

AMD system timer issues

Attachments