Nothing like another developer finding your abuses of a system then tripping over them to learn a lesson! iam_policy_attachment should be globally unique in an account. DOH!

Looking from the ridge

Our code is hosted in Github repositories and tested via Travis CI. The new path and the old path will diverage at the end of the Travis CI tests. Curretly we have continously deliver to our development integration environment, pushing after each build on master. Outside of master our hotfix and release-candidate branches will targeting those environment explicitly to update properly. With the current deployment secnarios we use Heroku. Deployments are literally just pushing to specific git repositories and updates magically happen.

New deployments will invovle the construction of Docker containers to push into ECR then cause the ECS system to update the running containers. Given our history with Heroku our application is relatively monolithic. We have two interacting applications which are linked, should be easy enough with the task descriptor. Initialization is another mountain to climb: for each deployment we need to run a set of migrations once. Bringing up a cluster also requires the system to run an initial setup.

Building the storage facility

Initially we wanted to setup an instance of the registry per deployment pod (what we call an entire instance of the system). To complete the project quickly we’ve compromised on targeting ECR for the container storage. It’s a little more expensive than running it ourselves, possibly. Pricing model is actually off of storage and bandwidth as opposed an instnace for it. Elastic Container Repository looks deceptively easy to create using Terraform. Let’s see how AWS’s console works with having them.

Brilliant. I was concerned because the wizard would just drop me into the single instance. I guess they figure the console was more friendly if you only had one? Or perhaps I just missed it. Interesting: ECR has an upper limit of 1K images per repository. Given some images easily consume 10 images I guess that is a resonsable limit. Probably not so reasonable over a year of deployments though. I’m hoping ECR functions like a normal repository and doesn’t try to collect unused iamges.

Hmm, it would be great if I could dump the URL when running Terraform! I’m sure there would be mroe required than just adding output stanzas. Nope, you need two output stanzas. One at the module level and another at the global level. Hmm, no luck yet. Fresh from the documentation: the configuration uses the module.mod-name.output-name value. Maybe I need to run terraform plan && terraform apply to get the variables? No dice. All well! A problem I shall return to later.

Meta machinery

Well, I’ve got the registry. Now onto the interesting problem of bootstraping a pod with software. There is an interesting question of how far we should go. Where does Terraform end and the operational systems begin. My spidy senses are tingling around here. I think it would be within the scope of Terraform to setup the opreational containers which orechestrate the container version but probably not manage the task description itself. Looks like someone already did the heavy thinking on the operational side. Probably go with a simplier dropping a message on an SNS queue from the CI framework for now. More automation later.

I figured the initial push would be configured to grab a specific container designed for this purprose specifically. I’m hoping this can be configured through Terraform. Nope, doesn’t look like Terraform sports this right now. Looks like the image will have to be hosted on well known registry.

Realizing the machinery

Alrighty, so I have a service which will go in and update just the container in the task definition then update the service configuration. It is not release ready at all, but I’m not going to complain.

I’ve designed the service to work with an opt-in list for tasks + services to be updated. The update will come from an SQS message. This is my first crack against the SQS system; I’m working from the Ruby documentation. Most examples create the queues; my scenario has the queue already existing. Oops, old documentation. V2 API. AWS::SQS::QueuePoller is much simplier.

Took me a few minutes to realize the reason why the creation screen wouldn’t let me create a new queue is because FIFO queues must end in FIFO. Gah! That was a horrid user expierence. They use the name seahorse for the V2 API apparently.