Monday, October 3, 2016

Work Rant: Engineers and lack of usability (NTP time syncing)

This is a 'Dear Engineers' rant and it springs from a deep and dark place in my soul where I nurse my anger towards systems and technologies that could have been made so much easier to work with, if engineers spent a little bit of time with Usability and GUI design staff...

Lets imagine you have a few Domain Controllers (DC). Some are virtual, some are physical and one is a Primary Domain Controller (PDC).

All the DC's (except one) get their time from the PDC and as we all know, having time in sync across our domain is VERY important. The PDC, however, gets its time from another DC (the one that does not get it from the PDC) and this DC gets it from an atomic clock somewhere (*.pool.ntp.org). We will call this DC the DCntp.

The DCntp goes down. No one cares, as its old and sad and needs to be buried anyway. No one remembers that the PDC gets its time from DCntp and therefore do not spend their weekend worrying about bringing it up.

Suddenly its Monday. People start to show up at work and quickly realize that something is wrong. Authentication with external systems do not work. Clocks on PC's and desk phones are off by a lot. Chaos and havoc ensues.

Problem: The PDC could not get its time from DCntp and defaulted to the local CMOS hardware clock. The PDC is virtual and does not have local CMOS hardware but an untrustworthy emulated CMOS which as horrible at keeping time. The PDC's time start to slide and everything else starts to inherit the problem. Locally that is not an issue, but in a globalized world with Cloud services and SaaS solutions it is!

Engineers - If the PDC notices that it can't get time from its NTP services repeatably, why would you want it to default to a local CMOS when its virtual?

Start logging failure errors in the log. Send mails to everyone in the Domain Admin group. Give pop-up messages when someone logs into the desktop. Default to a *.pool.ntp.org server.
RING SOME DAMN WARNING KLAXONS!

Better yet - Have a Best Practice analyzer run automatically every few weeks to deliver reports on issues; 'Hey Admin - Did you know that the PDC is not the primary NTP server for this domain? Maybe you should look into that?' or 'Hey Admin - Did you know that if I can't find an NTP server that I'll default to the local hardware which is virtual and that you are going to have a bad time?'

Its 2016 - Lets act that way!

Also, these two links saved my day:
Configure DC to synchronize time with external NTP server
How to configure an authoritative time server in Windows Server