I made the mistake of trying to tear an account down with Terraform. I’ve seen several error, the current being ClientException: TaskDefinition is inactive error while attempting to apply. The terse error doesn’t really help pinpoint the underlying cause :-/. I’m assuming it has something to do with the account being destroyed…a few times to actually destroy the resources. Overall not a bad tool but it definitely has some issues.

Terraform identity woes

Not sure how imports work; so I attempted to reactive the configuration. No good. Perhaps the id attribute for import is the name of the service? Hmm, nope. The resource doesn’t support importing! Well at least that one is direct and to the point. The other two errors complaining about the AWS region make me a little nurvous. terraform refresh perhaps? No change…again.

Back to teh bug hunt I guess. Looks like a solution may have been commited in November. I wonder when/if it was released in the version cut last week.

What fixed it you may ask? I modified the definition. In this case the definition was specifying a tag for the container. Freaking silly it didn’t just do that; I wonder why the algorithm was designed to check for an inactive configuration first.

Deploy the ORC army!

Or just a single container. Orchestration Record Controller has a Docker image at meschbach/rhumbix-orc. I’m in the process of tracking down who has access to the company account. So far deployment has been fairly straight forward: I only want one instance of the container running per cluster and it polls instead of being pushed to. I got to strip out all of the load balancing and autoscaling to leave a nice 36 lines of code.

Securing the container is the unknown here for me. The container needs ot run under an IAM policy which allows for two operations. Receiving SQS messages for processing I would imagine is a simple policy attachment or usage. Updating the ECS task descriptors will probably take a bit more finess.

A comprehensive set of permissions is burried in the manual of a queue. I’m guessing for the out of the box poller in Ruby the following permissions are required: sqs:DeleteMessage, sqs:GetQueueUrl, sqs:ReceiveMessage. At least that is my guess. We’ll see. The draft policy goes a little something like:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:Describe*",
        "ec2:AuthorizeSecurityGroupIngress"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [ "sqs:ReceiveMessage", "sqs:GetQueueUrl", "sqs:DeleteMessage" ],
      "Resource": [ "${aws_sqs_queue.orc-updates-queue.arn}" ]
    }
  ]
}
Intake Queue

Next up is building credentials to deploy new containers via Travis CI. Two actions will occurr at the end of successful run: a container will be pushed into the registry and a message will be dropped on the intake queue for ORC. This article looks promising. The essence of the requirements are: create a new IAM user with API only access; provide access the ECR and target queue. aws_iam_user should the output. I’m not sure how to get those out of the system except by hand. I belive the SQS permissions should be the following to the queue: sqs:SendMessage, sqs:GetQueueUrl. There is probably some of Murphey’s Law in here with needing to be able to list the queues also; I’ll brun that bridge later though.

ECR permission look like they will be a little bit different bear here. They aren’t particularlly straight forward on the permissions you need to give. If I wasn’t familiar with how Docker registries work this would be much more frustrating. As a start managed policies definitely help but dont’ queit go far enough. In this case we don’t want the user to be able to delete layers; this will probably be something in the future we develop to automate the process. My first crack at the policy for dropping the message and uploading images:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [ "sqs:SendMessage", "sqs:GetQueueUrl" ],
      "Resource": [ "${aws_sqs_queue.orc-updates-queue.arn}" ]
    },
    {
      "Effect": "Allow",
      "Action": [
	"BatchCheckLayerAvailability",
	"BatchGetImage",
	"CompleteLayerUpload",
	"GetRepositoryPolicy",
	"InitiateLayerUpload",
	"ListImages",
	"PutImage",
	"UploadLayerPart"
	],
      "Resource": [ "${aws_ecr_repository.pod-registry.arn}" ]
    },
  ]
}
Tune in tomorrow

For the next exciting chapter! Will it explode? Or will it work out well?