On 12-04-25 11:56 AM, Gilles Detillieux wrote:
On 04/25/2012 11:03 AM, Gilbert E. Detillieux wrote:
Are you sure ntpd is running on all systems? Try running the following command, on each of your systems:
/usr/sbin/ntpq -p
This will tell you not only whether ntpd is running, but also where each one is getting its clock settings from, what the drift is, etc.
Note that if the initial clock setting is too far out of whack, ntpd may not even start properly. It's usually useful to run ntpdate first, to at least start off with a close-to-synchronized clock. For some reason, RHEL systems don't do that by default even when you enable ntpd.
I ran that on all 3 systems, and it shows ntpd is indeed running on all. SL 5 does seem to run ntpdate first, before starting ntpd, to get the clock sync'ed up beforehand, as long as you have systems defined in /etc/ntp/step-tickers or you put a -x in OPTIONS in /etc/sysconfig/ntpd.
But I wonder if there are some NTP servers on the net that are out of whack. When I run ntpq, the system that has the drift (cliff) shows different results than the other two:
On cliff: remote refid st t when poll reach delay offset jitter ==============================================================================
caustique.anox. 209.51.161.238 2 u 5 64 73 35.302 32.315 4564.06 tb.mircx.com 64.90.182.55 2 u 2 64 77 46.530 72.637 4517.32 cliff.scrc.uman .INIT. 16 u - 64 0 0.000 0.000 0.000 larry.scrc.uman 208.80.96.70 3 u 16 64 76 0.001 6019.84 3675.39 dave2.scrc.uman 209.167.68.100 3 u 4 64 42 0.001 6140.53 3858.45 *LOCAL(0) .LOCL. 10 l 2 64 77 0.000 0.000 0.001
On larry: remote refid st t when poll reach delay offset jitter ==============================================================================
+zeus.yocum.org 131.188.3.220 2 u 148 256 377 35.330 -2.723 3.170 *ellen.linuxgene 142.3.100.2 2 u 214 256 377 31.665 -1.240 2.296 cliff.scrc.uman .INIT. 16 u 17 64 0 27.461 -87882. 0.000 larry.scrc.uman .INIT. 16 u - 1024 0 0.000 0.000 0.000 +dave2.scrc.uman 209.167.68.100 3 u 204 256 377 0.400 2.594 0.890 LOCAL(0) .LOCL. 10 l 44 64 377 0.000 0.000 0.001
dave2's results are similar to larry's.
A few other things I thought I should point out: All 3 systems have ports 123/tcp and 123/udp open in iptables. The clock on cliff seems to drift whether or not ntpd is running, though that could be because the calculated drift compensation is out of whack. The /var/lib/ntp/drift file on cliff hasn't been modified since 1:58 this morning, before the reboot, while it has been on the other 2 systems. All 3 systems have an identical configuration, using 0.pool.ntp.org as the step-ticker, and 0.pool.ntp.org and 1.pool.ntp.org as stratum 1 servers.
I'm somewhat hesitant to reply to this thread since, compared to you guys, I'm an amateur. However the folks at the CLL lab in Winnipeg and I have noticed similar behaviour in standalone machines not connected to an ntp server. Clearly this isn't the same problem you're experiencing but it looks close. The reason it looks this way in the standalone machines is because the internal battery used to maintain the settings is running low on power. I phrased it this way because the same behaviour occasionally appears in Macs as well as PCs. The solution is to replace the batteries. However we can put off replacing the battery if we connect the machine to an ntp server. Eventually it gets to the point where the battery can't maintain ANY settings. As the charge goes down the results are similar to what you are seeing.
Considering we deal with OLD (but mostly useful) machines at the CLL I am inclined to look at hardware rather than software as the major source of problems.
I doubt this will be useful to you but it is best to check out all possibilities starting with the simple stuff first.
Later Mike