Like an accountant who never balances their personal books, I do not regularly backup my information systems. In part this is because until recently I only had a single workstation at home and my laptop. A loss of either would be devistating. Taking a page from Isambard Kingdom Brunel I began writing an ambitious network storage intended to contain binary objects with metadata. I will admit I may have been trying to clone some of the awesomeness of S3 in my Mud project.

However I setup Gluster anyway to back Kubernetes. With the second node on-line I figured it is about time for me to get serious about backing things up. If I had to stack rank data to be backed up, and I do, I woudl say the following are important data stores systems I use:

  • Postgres
  • CouchDB
  • Git

I figured once I get the data extracted onto a network file system I can begin to think about more complicated things like rotation or offsite backup.

Backing up Postgres: Prior Art for

In the order of the Google God’s have provided.

The first hit is pretty straight forward: Rinor Maloku postgres-backup-container does the job by installing the Postgres tools on an Alpine image and dumps it to a volume. From the aks directory I am guessing their typical approach is to use it on an Azure network. Rinor Maloku actually published an article on how to use the container in Kubernetes with CronJobs which is pretty awesome. Even walks through rolling your own.

pgHoard looks really cool however it does not look Docker native at all. Gorgias produced a decent article on adapting it to work. I will keep this one in mind for future work.

KubeDB is definitely Kubernetes native. Looks like the primary approach is to use Custom Resource Definitinos (CRDs) to achieve it. According to the documentation the system does not support Postgres 12 yet. Streaming standbys are definitely cool however none of my services are that heavily used yet.

StackOverflow has quiet a few k8s questions. I take this as a sign of the DevOps movement breaking down barriers however that might still be optimistic. We will not know until the next generation of developers come along to reinvent the wheels. The main recommendation is to use kubectl exec into the contianer. I would really like to avoid breaking the containers seal and also be able to backup instances not running in Postgres.

Crunchy Data is another Kubernetes native solution. This does require using their operator which is fine however fails my primary goal of being able to backup both in-cluster and out-of cluster Postgres instnaces. I might steal the idea of building a command or subcommand to capture a backup but I think I am dreaming of a distant future there. For CI I would definitely like to take a backup before upgrading a deployment though.

Joel Saunders documents a method to backup and restore the database using kbuectl exec which is great. Looks more rudimentary than Rinor Maloku’s approach.

Prakarsh has a more complete example of how to upload artifacts to an object store like S3. This is a great eaxmple of using of a complete CronJob to get it done and going.

Looks like the best approach might be to base my container on Rinor Maloku’s example repository. Given the sensitivity of the data plus the need to ensure it’s compatible with Postgres 12 I am going to build off of his work. The biggest challenge might be manually triggering runs however I might be able to get away with the trick Crafty Pengiun’s use. Feels a bit janky though for pre-deployment data checkpoints.