Best common practices

From AdminWiki

(Difference between revisions)
Jump to: navigation, search
(Synchronization)
m (Installed software)
 
(24 intermediate revisions not shown)
Line 3: Line 3:
= Time issues =
= Time issues =
-
== Synchronization ==
+
== Keep in sync ==
There is absolutely ''no'' excuse for not having a correctly synchronized clock. This will bite you when you you've to compare logfiles from multiple servers and cause problems when you need to deliver ''accurate'' logs (police investigation, etc.).
There is absolutely ''no'' excuse for not having a correctly synchronized clock. This will bite you when you you've to compare logfiles from multiple servers and cause problems when you need to deliver ''accurate'' logs (police investigation, etc.).
-
The problem got worse in the last years (at least that's my impression) because processors got faster and time-keeping-mechanisms sloppier. What the operating system basically does <ref> I'm not completely sure about that. If I'm horribly wrong here, please tell me so ;) </ref> when booting up is fetching the current time and date from the realtime clock, then taking a wild guess on how many CPU cycles (or any other time source, e.g. HPET) are approximately one second and then using this value as long as the OS runs. Excessive IRQ usage, CPU cycle modulation (power saving) and other factors might also aid the inaccuracy.
+
The problem got worse in the last few years (at least that's my impression) because processors got faster and/or time-keeping-mechanisms sloppier. What the operating system basically does<ref>I'm not completely sure about that. If I'm horribly wrong here, please tell me so ;)</ref> when booting up is fetching the current time and date from the RTC, then taking a wild guess on how many CPU cycles<ref>(or any other time source, e.g. HPET)</ref> are approximately one second and then using this guesstimate as long as the OS runs, which unfortunately is almost never accuracte. Excessive IRQ usage, CPU cycle modulation (power saving) and other factors might also increase the inaccuracy.
What a NTP daemon basically does is comparing the system time with an external timesource (usually a NTP server), estimating on how far off the OS is and then disciplining the system time. It also tracks the inaccuracy of the system clock so that it can keep the clock in sync even when the ntp server should be unreachable for longer periods.
What a NTP daemon basically does is comparing the system time with an external timesource (usually a NTP server), estimating on how far off the OS is and then disciplining the system time. It also tracks the inaccuracy of the system clock so that it can keep the clock in sync even when the ntp server should be unreachable for longer periods.
-
== Other pitfalls ==
+
== Never trust the RTC ==
Another major issue are wrong times in the RTC. You have to ensure that your system time is correct ''before'' your operating switches to multi-user mode.
Another major issue are wrong times in the RTC. You have to ensure that your system time is correct ''before'' your operating switches to multi-user mode.
-
Common scenario:
+
Common scenario in DST-countries:
*Your RTC is set to the local timezone.
*Your RTC is set to the local timezone.
-
*Your server has an uptime >= 180 days, meaning that it probably has passed a DST<ref>Dailight saving time</ref> boundary.
+
*Your server has an uptime >= 180 days, meaning that it probably has passed a DST<ref>Daylight saving time</ref> boundary.
*Your server crashes.
*Your server crashes.
-
At this point, if you haven't taken any precautions, you're fucked.
+
At this point, if you haven't taken any precautions, you're fucked as soon as the server is online again.
* Best case: wrong logfile-entries and a few incorrect mtimes on files.  
* Best case: wrong logfile-entries and a few incorrect mtimes on files.  
-
* Worst case: Important business data (accounting, transactions, etc.) have the wrong timestamps.
+
* Worst case: Important business data (accounting, transactions, etc.) have the wrong timestamps. Good luck correcting these by hand.
-
There are two solutions to this problem:
+
There are a few solutions to this problem:
* Put ntpdate in your startup scripts ''after'' your network has initialized and ''before'' ntp-server starts. Test it!
* Put ntpdate in your startup scripts ''after'' your network has initialized and ''before'' ntp-server starts. Test it!
:This has the drawback that when the network or your ntp-server of choice is down you'll still run into troubles
:This has the drawback that when the network or your ntp-server of choice is down you'll still run into troubles
* Have hwclock write the system time to the RTC every now and then.
* Have hwclock write the system time to the RTC every now and then.
-
:This is still dangerous, since there's a window where your server will boot with the wrong time in the RTC, but minimizes the risk noticeably.
+
:This is still dangerous, since there's a window where your server will boot with the wrong time in the RTC, but it minimizes the risk noticeably.
 +
* Set the hardware-clock to UTC
 +
:Untested. If anybody successfully uses this in a DST-zone, please contact me.
 +
 
 +
= Installed software =
 +
 
 +
This is the absolute minimum of software ''every'' server should have installed.
 +
 
 +
* A working compiler and linker toolchain + headers.
 +
* A syscall-level diagnostic tool like strace/truss/etc.
 +
* A usable web-browser. links, lynx, elinks, etc.
 +
* A usable ftp-client. ncftp or lftp.
 +
* A multi-purpose download agent. wget or curl.
 +
* A sane texteditor.
 +
* tcpdump
 +
* lsof
 +
 
 +
 
 +
Failure to meet these criterias will catch up with you someday when you expect it the least.
 +
 
 +
In many cases you will also need xauth, so ssh/X11 forwarding can work. You don't need it now, but you will need it at some point.
 +
 
 +
= Environment =
 +
Set a clean environment:
 +
* $EDITOR: vim/emacs
 +
* $LANG: <tt>en_US.UTF-8</tt>, <tt>C</tt> or make sure it is ''unset''.
 +
* $PAGER: less
 +
 
 +
= Footnotes =
 +
<references/>
 +
 
= The rest =
= The rest =

Latest revision as of 00:32, 27 May 2006

This should give you a rundown on the absolute minimum every server should have.

Contents

Time issues

Keep in sync

There is absolutely no excuse for not having a correctly synchronized clock. This will bite you when you you've to compare logfiles from multiple servers and cause problems when you need to deliver accurate logs (police investigation, etc.).

The problem got worse in the last few years (at least that's my impression) because processors got faster and/or time-keeping-mechanisms sloppier. What the operating system basically does[1] when booting up is fetching the current time and date from the RTC, then taking a wild guess on how many CPU cycles[2] are approximately one second and then using this guesstimate as long as the OS runs, which unfortunately is almost never accuracte. Excessive IRQ usage, CPU cycle modulation (power saving) and other factors might also increase the inaccuracy.

What a NTP daemon basically does is comparing the system time with an external timesource (usually a NTP server), estimating on how far off the OS is and then disciplining the system time. It also tracks the inaccuracy of the system clock so that it can keep the clock in sync even when the ntp server should be unreachable for longer periods.

Never trust the RTC

Another major issue are wrong times in the RTC. You have to ensure that your system time is correct before your operating switches to multi-user mode.

Common scenario in DST-countries:

  • Your RTC is set to the local timezone.
  • Your server has an uptime >= 180 days, meaning that it probably has passed a DST[3] boundary.
  • Your server crashes.


At this point, if you haven't taken any precautions, you're fucked as soon as the server is online again.

  • Best case: wrong logfile-entries and a few incorrect mtimes on files.
  • Worst case: Important business data (accounting, transactions, etc.) have the wrong timestamps. Good luck correcting these by hand.


There are a few solutions to this problem:

  • Put ntpdate in your startup scripts after your network has initialized and before ntp-server starts. Test it!
This has the drawback that when the network or your ntp-server of choice is down you'll still run into troubles
  • Have hwclock write the system time to the RTC every now and then.
This is still dangerous, since there's a window where your server will boot with the wrong time in the RTC, but it minimizes the risk noticeably.
  • Set the hardware-clock to UTC
Untested. If anybody successfully uses this in a DST-zone, please contact me.

Installed software

This is the absolute minimum of software every server should have installed.

  • A working compiler and linker toolchain + headers.
  • A syscall-level diagnostic tool like strace/truss/etc.
  • A usable web-browser. links, lynx, elinks, etc.
  • A usable ftp-client. ncftp or lftp.
  • A multi-purpose download agent. wget or curl.
  • A sane texteditor.
  • tcpdump
  • lsof


Failure to meet these criterias will catch up with you someday when you expect it the least.

In many cases you will also need xauth, so ssh/X11 forwarding can work. You don't need it now, but you will need it at some point.

Environment

Set a clean environment:

  • $EDITOR: vim/emacs
  • $LANG: en_US.UTF-8, C or make sure it is unset.
  • $PAGER: less

Footnotes

  1. I'm not completely sure about that. If I'm horribly wrong here, please tell me so ;)
  2. (or any other time source, e.g. HPET)
  3. Daylight saving time


The rest

  • backup
  • monitoring
  • sane logging
  • handling of (security) updates
Personal tools