Best common practices
From AdminWiki
m (→The problem) |
(→Synchronization) |
||
Line 9: | Line 9: | ||
The problem got worse in the last years (at least that's my impression) because processors got faster and time-keeping-mechanisms sloppier. What the operating system basically does <ref> I'm not completely sure about that. If I'm horribly wrong here, please tell me so ;) </ref> when booting up is fetching the current time and date from the realtime clock, then taking a wild guess on how many CPU cycles (or any other time source, e.g. HPET) are approximately one second and then using this value as long as the OS runs. Excessive IRQ usage, CPU cycle modulation (power saving) and other factors might also aid the inaccuracy. | The problem got worse in the last years (at least that's my impression) because processors got faster and time-keeping-mechanisms sloppier. What the operating system basically does <ref> I'm not completely sure about that. If I'm horribly wrong here, please tell me so ;) </ref> when booting up is fetching the current time and date from the realtime clock, then taking a wild guess on how many CPU cycles (or any other time source, e.g. HPET) are approximately one second and then using this value as long as the OS runs. Excessive IRQ usage, CPU cycle modulation (power saving) and other factors might also aid the inaccuracy. | ||
- | What a | + | What a NTP daemon basically does is comparing the system time with an external timesource (usually a NTP server), estimating on how far off the OS is and then disciplining the system time. It also tracks the inaccuracy of the system clock so that it can keep the clock in sync even when the ntp server should be unreachable for longer periods. |
== Other pitfalls == | == Other pitfalls == |
Revision as of 17:29, 24 May 2006
This should give you a rundown on the absolute minimum every server should have.
Contents |
Time issues
Synchronization
There is absolutely no excuse for not having a correctly synchronized clock. This will bite you when you you've to compare logfiles from multiple servers and cause problems when you need to deliver accurate logs (police investigation, etc.).
The problem got worse in the last years (at least that's my impression) because processors got faster and time-keeping-mechanisms sloppier. What the operating system basically does [1] when booting up is fetching the current time and date from the realtime clock, then taking a wild guess on how many CPU cycles (or any other time source, e.g. HPET) are approximately one second and then using this value as long as the OS runs. Excessive IRQ usage, CPU cycle modulation (power saving) and other factors might also aid the inaccuracy.
What a NTP daemon basically does is comparing the system time with an external timesource (usually a NTP server), estimating on how far off the OS is and then disciplining the system time. It also tracks the inaccuracy of the system clock so that it can keep the clock in sync even when the ntp server should be unreachable for longer periods.
Other pitfalls
Another major issue are wrong times in the RTC. You have to ensure that your system time is correct before your operating switches to multi-user mode.
Common scenario:
- Your RTC is set to the local timezone.
- Your server has an uptime >= 180 days, meaning that it probably has passed a DST[2] boundary.
- Your server crashes.
At this point, if you haven't taken any precautions, you're fucked.
- Best case: wrong logfile-entries and a few incorrect mtimes on files.
- Worst case: Important business data (accounting, transactions, etc.) have the wrong timestamps.
There are two solutions to this problem:
- Put ntpdate in your startup scripts after your network has initialized and before ntp-server starts. Test it!
- This has the drawback that when the network or your ntp-server of choice is down you'll still run into troubles
- Have hwclock write the system time to the RTC every now and then.
- This is still dangerous, since there's a window where your server will boot with the wrong time in the RTC, but minimizes the risk noticeably.
The rest
- backup
- monitoring
- sane logging
- handling of (security) updates