Alrighty, today I try to conquer getting Vault done. Hopefully I will have fewer interruptions this time! I left off settling for EC2 linux however I need to construct some service wrappers to launch and monitor processes. Originally I thought it was running a derriative of Sys5 RC system however the scripts reacted differently than expected. While trying to launch the Vault process with service it failed with the script file not being executable:

[ec2-user@redacted]$ service vault start
env: /etc/init.d/vault: Permission denied

ls the files in the directory to find the typical mask is -rwxr-xr-x with the owner root for all the other files. The -r-xr--r-- apparently is wrong; time to clone those flags. Hmm, even after sudo chmod u=rwx,go=rx /etc/init.d/vault I’m still getting file not found issues :-(. I think this is a good point to stop trying to brute force.

This looks promising. I was a little skeptical of the interpreter line #!/sbin/runscript from the OpenRC tutorial I was following; let’s start with that conversion. Magic! Flipped out the interpreter and function import and we are good. Maybe…the pid file wasn’t created. Ah, helps if I also have the case statement. It’s amazing how fast I’ve become accustom to writing SystemD unit files; I’ve apparently trashed all my Sys5 script knowledge..probably because I’ve always felt it’s a little janky and a lot of work.

Turns out EC2-Linux doens’t support many of the standard services. As much as I’m ashamed of it, my init.d script looks like the following right now. I’ve really got to brush up on this; I’m so use to SystemD for constructing services this is painstaking. Of note in here I don’t think the daemon process is dissociated from the controlling terminal but should be good enough to place in user_data. I’m hoping I’m not wrong about that.

#!/bin/bash

. /etc/init.d/functions

PIDFILE=/var/run/vault.pid
CONFIG=/usr/local/etc/vault-config.hcl

start() {
	/usr/local/bin/vault server -config $CONFIG >/var/log/vault.stdout 2>/var/log/vault.stderr </dev/null &
	pid=$!
	echo $pid >$PIDFILE
	echo "Started Vault as $pid"
}

stop() {
	echo "Stopping vault"
	kill -QUIT `cat $PIDFILE`
	echo "done"
}

case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    status)
	echo "TODO"
        ;;
    restart)
        stop
        start
        ;;
    reload)
	echo "TODO"
        ;;
    condrestart)
	echo "TODO"
	;;
    **)
        echo "Usage: <servicename> {start|stop|status|reload|restart"
        exit 1
        ;;
esac
exit $?

Alrighty, next up is getting Consul up off the ground so I can get this stuff clustered. I’ve settled on the following query to figure out what other nodes are in the cluster: aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" "Name=tag:Name,Values=fqdn" --query 'Reservations[].Instances[].NetworkInterfaces[].PrivateIpAddresses[].PrivateIpAddress'. This will produce a JSON document containing just their IP addresses. Good enough so far, next up will be transforming that. Hmm, might not be nessecery with retry_join_ec2 though. Let’s see.

I really need to find out if I can get Terraform to place multiple files on disk for me without having to build a large complicated shell script to do so. The level of meta is really more than I wish to deal with, also fairly WET. Lesson learned though: if you are using cat to produce shell scipts and don’t desire EL-expression evaluation then you should surrond the end delimeter specification in single quotes.

cat >/tmp/bar <<'EOF'
un_evaled=foo
echo "$un_evaled will get evaled when executed"
EOF

Okay! So data centers with Consul may only be alpha-numeric + underscores and hyphens. I wonder why they don’t allow the normal domain dot notation.

I got Consul up and converged into a cluster. In order to complete it I had to ensure ports 8300-8301/tcp and 8301/udp are able to transit between the Consul instances. I appreciate the documentation being straightforward and honest about this howver I overlooked it a few times. Fairly straight forward.

Vault came up as individual instances. For a Vault system each node needs to be unsealed which was a bit different than what I was expected but makes sense. I could connect them to Consul and they would recognize a leader, however only the leader was able to perform operations. Turns out you need Vault instances to be able to talk on port 8200/tcp amongst themselves. I should have RTFM a little closer!