Re: [RndTbl] strange NTP problem on one of 3 peers

25 Apr 2012


      On 12-04-25 11:56 AM, Gilles Detillieux wrote:
...
On 04/25/2012 11:03 AM, Gilbert E. Detillieux wrote:
...
Are you sure ntpd is running on all systems? Try running the following
command, on each of your systems:
/usr/sbin/ntpq -p
This will tell you not only whether ntpd is running, but also where each
one is getting its clock settings from, what the drift is, etc.
Note that if the initial clock setting is too far out of whack, ntpd may
not even start properly. It's usually useful to run ntpdate first, to at
least start off with a close-to-synchronized clock. For some reason,
RHEL systems don't do that by default even when you enable ntpd.
I ran that on all 3 systems, and it shows ntpd is indeed running on all.
SL 5 does seem to run ntpdate first, before starting ntpd, to get the
clock sync'ed up beforehand, as long as you have systems defined in
/etc/ntp/step-tickers or you put a -x in OPTIONS in /etc/sysconfig/ntpd.
But I wonder if there are some NTP servers on the net that are out of
whack. When I run ntpq, the system that has the drift (cliff) shows
different results than the other two:
On cliff:
remote refid st t when poll reach delay offset jitter
==============================================================================
caustique.anox. 209.51.161.238 2 u 5 64 73 35.302 32.315 4564.06
tb.mircx.com 64.90.182.55 2 u 2 64 77 46.530 72.637 4517.32
cliff.scrc.uman .INIT. 16 u - 64 0 0.000 0.000 0.000
larry.scrc.uman 208.80.96.70 3 u 16 64 76 0.001 6019.84 3675.39
dave2.scrc.uman 209.167.68.100 3 u 4 64 42 0.001 6140.53 3858.45
*LOCAL(0) .LOCL. 10 l 2 64 77 0.000 0.000 0.001
On larry:
remote refid st t when poll reach delay offset jitter
==============================================================================
+zeus.yocum.org 131.188.3.220 2 u 148 256 377 35.330 -2.723 3.170
*ellen.linuxgene 142.3.100.2 2 u 214 256 377 31.665 -1.240 2.296
cliff.scrc.uman .INIT. 16 u 17 64 0 27.461 -87882. 0.000
larry.scrc.uman .INIT. 16 u - 1024 0 0.000 0.000 0.000
+dave2.scrc.uman 209.167.68.100 3 u 204 256 377 0.400 2.594 0.890
LOCAL(0) .LOCL. 10 l 44 64 377 0.000 0.000 0.001
dave2's results are similar to larry's.
A few other things I thought I should point out: All 3 systems have
ports 123/tcp and 123/udp open in iptables. The clock on cliff seems to
drift whether or not ntpd is running, though that could be because the
calculated drift compensation is out of whack. The /var/lib/ntp/drift
file on cliff hasn't been modified since 1:58 this morning, before the
reboot, while it has been on the other 2 systems. All 3 systems have an
identical configuration, using 0.pool.ntp.org as the step-ticker, and
0.pool.ntp.org and 1.pool.ntp.org as stratum 1 servers.
I'm somewhat hesitant to reply to this thread since, compared to you 
guys, I'm an amateur. However the folks at the CLL lab in Winnipeg and I 
have noticed similar behaviour in standalone machines not connected to 
an ntp server. Clearly this isn't the same problem you're experiencing 
but it looks close. The reason it looks this way in the standalone 
machines is because the internal battery used to maintain the settings 
is running low on power. I phrased it this way because the same 
behaviour occasionally appears in Macs as well as PCs. The solution is 
to replace the batteries. However we can put off replacing the battery 
if we connect the machine to an ntp server. Eventually it gets to the 
point where the battery can't maintain ANY settings. As the charge goes 
down the results are similar to what you are seeing.

	Considering we deal with OLD (but mostly useful) machines at the CLL I 
am inclined to look at hardware rather than software as the major source 
of problems.

	I doubt this will be useful to you but it is best to check out all 
possibilities starting with the simple stuff first.

				Later
				Mike

Re: [RndTbl] strange NTP problem on one of 3 peers

Mike Pfaiffer