Devops Log: Stratum 3 boundary service
• Mark Eschbach
We’ve got a number of Stratum 2 NTP servers intended to serve all our pods. Since my original problem was getting NTP traffic to egress across the public internet I’ll be setting up two boundary NTP servers at Stratum 3 in order to proxy the traffic. This will also prevent one pod from greatly adversely effecting others when I mis-configure them.
The template for the NTP service themselves is fairly straight forward. We only want to honor requests from the local VPC while ignoring all public requests. Additionally we should only allow administrative commands from the local machine as oppsed to the VPC.
driftfile /var/lib/ntp/drift
# don't answer queries from strangers
restrict default ignore
restrict default -6 ignore
# Allow all operations from localhost
restrict 127.0.0.1
restrict ::1
# Service to the local network
restrict ${ipv4_net} mask ${ipv4_mask} nomodify notrap nopeer noquery
# Use metadata host as a time source
server 169.254.169.123 iburst # AWS metadata time
restrict 169.254.169.123 nomodify notrap nopeer noquery
# Use offical Stratum 2 time servers for our AZs
server us-west-2b.stratum-2.invalid iburst
restrict us-west-2b.stratum-2.invalid nomodify notrap nopeer noquery
server us-west-2c.stratum-2.invalid iburst
restrict us-west-2c.stratum-2.invalid nomodify notrap nopeer noquery
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface listen lo
# CVE-2013-5211 fix
disable monitor
Each NTP server takes about 5-10 minutes to come up and be confident about their time. Confidence is established by
hearing about from higher stratum. One may monitor the progress via ntpq -pn
on the host or by providing on the
host as the last argument. The Reach field in the table is an octal bitmask representing sample periods the local
NTPD
was expecting to hear back from a server. Each bit represents one sample and is left shifted for each new
sample. The service remembers the last 8 samples, meaning a fully reachable service will show as 377
.
Discovery
Next up is how the other ndoes in the cluster will receive their updates. There were two paths which one could approach
these problems with. First would be modifying DHCP options so on lease renewal it would utilize the given NTP
services. I’m not entirely convinced EC2 Linux is configured to honor those options as I saw no plubming to rewrite
/etc/ntp.conf
anywhere. I opted for allowing the NTP hosts to modify the a pod’s private DNS to provide specific.
On boot up the server will connect to the Route 53 service and register itself under a well known name according to the
availability zone its running under, such as ntp.az-e.pod
. The only catch with this approach is ntpd only queries
for the address of the service on boot. If there is no AAAA
or A
record then the server
entry is discarded. This
is a benefit as we can place all AZ entries in a single file and missing services will be ignored. It’s also a curse
when the NTP services are restarted as all nodes will need their NTP services cycled to track the changes.
Stratum 4 Configuration
A Stratum 4 tier with the above auto-configuration would look like the following. I chose to use the AWS metadata service in addition to the pod sources to ensure both are close enough. Since many of our hosts perform AWS operations knowing time drift to AWS’s clocks is helpful.
server 169.254.169.123 iburst # AWS metadata time
restrict 169.254.169.123 nomodify notrap nopeer noquery
server ntp.az-a.pod iburst
server ntp.az-b.pod iburst
server ntp.az-c.pod iburst
server ntp.az-d.pod iburst
server ntp.az-e.pod iburst
server ntp.az-f.pod iburst
server ntp.az-g.pod iburst
To restart ntpd you can use the following command: sudo service ntpd restart && sleep 10 && ntpq -pn
. The sleep is
helpful to ensure the initial timing is received. The ibrust
option means there are 5 queries spaced 2 seconds apart
sent to the server to bring the node on-line quickly. So anything below 5 seconds is really to short to get useful
stable state information. If a service was not heard from I’ve heard reports it will show up as .INIT.
in the refid
field, however my experience show it will show as .STEP.
for a long while first.