I have a weird problem with clock drift that just started to happen today on one of my Linux systems. I was wondering if someone on the list has some NTP experience and could help me solve this puzzle.
I have a group of 3 systems operating as peers, and they've been keeping time well for years. Yesterday I upgraded them from Scientific Linux 5.7 to 5.8 (an RHEL 5.8 clone like CentOS 5.8), and rebooted them to the latest kernel on SL 5.8, 2.6.18-308.4.1.el5. I rebooted 2 of them yesterday evening, and the last one I set an at job to reboot at 2:30 am. (It's our mail server so I didn't want to reboot it earlier.) This morning, I noticed this last system's clock was 4-5 minutes behind the others. I've stopped ntpd, reset the clock to the correct time, and restarted ntpd. I've done this twice already this morning, and each time, the clock starts slowly drifting backwards.
The syslog entries from ntpd in /var/log/messages on the 2 other systems show fairly frequent occurrences of "synchronized to <IP>, stratum <n>", where n is usually 2 or 3. But for the mail server with the drifting clock, the only ntp sync logged this week was at 21:03:03 yesterday. The last ones before that were April 10 & April 4, i.e. very irregularly. The oldest log entries I have in /var/log/messages.4 show more regular syncs (at least 1-2 a day) up to March 31. So it's possible this problem existed for a while and had nothing to do with the updates yesterday, but this is the first time the drift got so bad it drew attention to itself (some file modification times got out of sync between this server and another system).
I'd appreciate any ideas on how to tackle this problem.
Gilles