Mark Eschbach

Software Developer && System Analyst

CoreOS

CoreOS is a Linux distribution intended for cluster computing to run distributed applications.

Installing from an ISO

The CoreOS ISO (at least at the time of writing) isn't ready to cluster yet. In order to install CoreOS we need to configure the node use the cloud-config language using YAML grammar. In order to log into the box, you must provide at least a public ssh key under #cloud-config.ssh_authorized_keys[]; you may provide multiple keys.

Xref

Cloud Config

The Cloud Config file describe how each node within the cluster is configured. Effectively this provides all the elements for the ndoe to join the cluster and to be able to log in in case of problems. Some cloud providers allow for booting with a given cloud config file, making life gravy.

Minimum requriements

The goal of the configuration below is for a minimum viable configuration for CoreOS. Probably doesn't come close, but it works for now!

#cloud-config.coreos.etcd2.discovery
The address of the discovery token to boostrap the cluster.
#cloud-config.hostname
Strictly speaking this is not required, however it is generally helpful to know which node you are running on when.
#cloud-config.ssh_authroized_keys
Seeding of the public keys allowed to login with over SSH.

Xref

Fleet

Fleet is a method of managing the SystemD unit across a cluster. It is a very veristal mechanism for executing units accross the clust or just on desired nodes.

Reloading

There is no simple command for reloading unit files as of yet. The following will reload all files. Please note this will stop all related units.

export UNIT_FILE="example.service"
fleetctl destroy $UNIT_FILE && fleetctl submit $UNIT_FILE && fleetctl start $UNIT_FILE
		

Listing status of units within the clust

The following will list the status of the units within the cluster.

fleetctl list-units
		

NOTE: Units can be in the state 'activating' when failed and attempting to restart.

Etcd2 Verification

Listing nodes in the cluster

etcd 2 series

etcdctl cluster-health
			

etcd 0.x series

etcdctl ls /_etcd/machines
			

Notes

  • As of 2015-08-18 there is no sane way to promote an etcd2 proxy into a member for leader elections.
  • according to https://gist.github.com/skorfmann/10243181 you can issue a DELETE verb to https://discovery.etcd.io/${TOKEN}/_state to delete the cluster state and reset.

Flannel

Flannel is a mechanism for allowing an application on one container to be exposed to through public ports on another.

Setup

When chosing a network prefix, be aware each host will get a netmask of /24. This means if you select a /16 you get a subnet of /8; roughly 220 possible machine hosting your Docker applicaitons.

The preferred mecahnism is to publish into the cloud-config file. However if you are testing or setting up by hand you can use etcd2 by executing the following command. This will propogate through the network and establish the fabric.

etcdctl set /coreos.com/network/config '{ "Network": "172.31.0.0/16" }'

Xref

  • https://en.wikipedia.org/wiki/Private_network - Private IPv4 address spaces.
  • https://coreos.com/flannel/docs/latest/flannel-config.html - CoreOS's documentation on the subject.

Network Fabric Health

I really need to figure this one out.

Fleet

Fleet is a distributed systemd unit orchestration system.

Setup

This really only depends on etcd working, should just automatically happen after that.

Xref

Verifying health

fleetctl list-machines

New Relic Monitoring

Grab the example Fleet service file from: https://github.com/newrelic-platform/docker_server_agent/blob/master/docker.service. Ensure you replace the license key, then salt and pepper to taste. Submit and start the job to

fleetctl
.

If you are a fan of using /etc/environment, then you could easily adapt the file to draw the key from the environment. Care should probably be taken to ensure your key doesn't monitor application unintentionally.

Xref

  • https://discuss.newrelic.com/t/using-server-monitoring-on-coreos/24852 - New Relic dicussion board for setting up the service.

Securing

SSH on alternative port

By defualt CoreOS uses an asymmetric key to authenticate administrator(s), so brute force attacks are unlikely to succed. However the scans are annoying and could be expensive in terms of CPU and network resources.

cloud-config fragment

coreos:
  units:
  - name: sshd.socket
    command: restart
    content: |
      [Socket]
      ListenStream=2222
      Accept=yes
			

Firewall rules

CoreOS utilizes iptables. As a general rule you should probably use iptables-apply(8), howeer I was unable to use this because of a missing command.

Bewlow is an example of firewall rules. Please note the example of dropping a scanning or attacking host. I believe you may need to remove the comments

*filter
:INPUT ACCEPT [368:102354] # numbers are packets/bytes respectively and seem to be ignored?
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [92952:20764374]

#
# Accepts
#
-A INPUT -i lo -j ACCEPT
-A INPUT -i eth1 -j ACCEPT

#
# Example dropping of malicous host; need to unblock at some point
#
-A INPUT -i eth0 -p tcp -s 45.114.11.54 -j DROP

#
# Track connection state and allow host to establish connections
#
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

#
# Explicit allows
#
-A INPUT -i eth0 -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -i eth0 -p tcp -m tcp --dport 80 -j ACCEPT

#
# General logging rule
#
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7

#
# Drop all remaining - NOTE we specificy the interface
#
-A INPUT -i eth0 -j DROP
COMMIT
		

Please note the interface is specifically specified as the eth0, which is generally the public port. If you are clustering with flannel or other network fabric, have a secondary datacenter only NIC, or use the Docker default network then omitting the interface on this rule will cause a number of failures.

NOTE

Ensure there is a new line after the COMMIT command, otherwise iptables-restore will terminate with a strange error about invalid commands.

Rule Activiation on cloud-config

TODO

Xref