Best common practices

From AdminWiki

(Difference between revisions)
Jump to: navigation, search
(Other pitfalls)
m (The problem)
Line 3: Line 3:
= Time issues =
= Time issues =
-
== The problem ==
+
== Synchronization ==
There is absolutely ''no'' excuse for not having a correctly synchronized clock. This will bite you when you you've to compare logfiles from multiple servers and cause problems when you need to deliver ''accurate'' logs (police investigation, etc.).
There is absolutely ''no'' excuse for not having a correctly synchronized clock. This will bite you when you you've to compare logfiles from multiple servers and cause problems when you need to deliver ''accurate'' logs (police investigation, etc.).

Revision as of 17:25, 24 May 2006

This should give you a rundown on the absolute minimum every server should have.

Contents

Time issues

Synchronization

There is absolutely no excuse for not having a correctly synchronized clock. This will bite you when you you've to compare logfiles from multiple servers and cause problems when you need to deliver accurate logs (police investigation, etc.).

The problem got worse in the last years (at least that's my impression) because processors got faster and time-keeping-mechanisms sloppier. What the operating system basically does [1] when booting up is fetching the current time and date from the realtime clock, then taking a wild guess on how many CPU cycles (or any other time source, e.g. HPET) are approximately one second and then using this value as long as the OS runs. Excessive IRQ usage, CPU cycle modulation (power saving) and other factors might also aid the inaccuracy.

What a ntp daemon does is comparing the system time with an external timesource (usually a NTP server), estimating on how far off the OS is and then disciplining the system time.

Other pitfalls

Another major issue are wrong times in the RTC. You have to ensure that your system time is correct before your operating switches to multi-user mode.

Common scenario:

  • Your RTC is set to the local timezone.
  • Your server has an uptime >= 180 days, meaning that it probably has passed a DST[2] boundary.
  • Your server crashes.


At this point, if you haven't taken any precautions, you're fucked.

  • Best case: wrong logfile-entries and a few incorrect mtimes on files.
  • Worst case: Important business data (accounting, transactions, etc.) have the wrong timestamps.


There are two solutions to this problem:

  • Put ntpdate in your startup scripts after your network has initialized and before ntp-server starts. Test it!
This has the drawback that when the network or your ntp-server of choice is down you'll still run into troubles
  • Have hwclock write the system time to the RTC every now and then.
This is still dangerous, since there's a window where your server will boot with the wrong time in the RTC, but minimizes the risk noticeably.

The rest

  • backup
  • monitoring
  • sane logging
  • handling of (security) updates
Personal tools