It is a good thing to have backups. After the horrible AC incident of last month, even though most of the cluster came back on-line reasonably, I realized I should practice what I preach to others. Backing up is important and although Gluster is fault tolerant the AC incident showed it is not without it’s own problems. Fault tolerance at FS level is defintely different than have restorable backups.

Rough scetch I am thinking of is Postgres to an S3 like system for on-site storage. From there it will get pushed into the offsite storage and rotated out when appropriate. Phase 1 is defintely just a local backup though. To that extent yesterday I setup Minio which was fairly easy and striaght forward. In the future I should play with local volumes more often.

Exploring pgBackRest

pgBackRest seems to be the best solution with multiple articles praising it. They have some impressive claims about throughput so I am exicted to see what it does. Reading through the documentation it appears as though many of these require access to database cluster data itself. I was hoping for something slightly higher such as the network level.

Exploring the manual route

As much as I would love an out of the box solution, I am thinking of just a simple solution at this point might make the most sense. To solve this problem it should be:

  • Use pg_dump to capture a logical snapshot of the database(s) of concern
  • Push to Minio for durable storage, deleting after some number of days days. At first it does not look like Minio supported this feature. However near the end of the massive thread it looks like support landed around August of 2019.

Placing this in a periodic cron job would not be too hard. Forcing snapshots would be a little annoying on deploys though. Perhaps a future option would be to write an operator to tie credentials and snapshots together.

pg_dump

Generally I would like my backups to be applicable to an empty database. Such that I issue a command to restore the database after creating a new empty one. It has been a while so I believe the following options are required:

  • --clean Placing the data into an empty database without interference of existing data.
  • --create Creates the structure of the data is in the proper state
  • --encoding=UTF8 so there is not a 💩 to fowl things up.
  • --format=plain creates a more future-proof backup. Be horrible to find out you’ve been using an old format it may export to but not recall.
  • --no-privileges to prevent odd ACL related issues
  • --no-owner to avoid restore issues with multiple users in the database. This will not work if you have multiple users who own various objects within your database, but are you even microservicing at that point?
  • --compress=9 is something I struggle with. Definitely needs to be compressed but the question is around in pg_dump or an external command. For now I will go with the simple option.

The end result is a command like pg_dump --clean --create --encoding=UTF8 --no-privileges --no-owner options-dev -Z 9 >options-dev.sql.gz

Minio Upload

To make things simple I have decided to learn how to use their mc command. That is the easiest way obviously. If it does what it says on the tin the command should look like:

mc cp options-dev.sql.gz minio.platform-minio.svc.workshop.k8s/test-backups/options-dev.sql.gz

So it mostly worked as I thought, but it was me who got it wrong. The following worked as expected although I am wondering how to pass the authentication data in quietly.

docker run -it -v $PWD:/data --entrypoint=/bin/sh minio/mc
mc config host add minio http://minio-service.platform-minio.svc.workshop.k8s:9000 BKIKJAA5BMMU2RHO6IBB V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12
mc cp option-dev.sql.gz minio/backups/option-dev.sql.gz

To transfer my ~100MB file took approximately 17 seconds, averaging 5.88MB/s. Not too bad until I remembered my wifi segment is running at ~878Mbps (109.75MB/s) with the server being 1Gbps (125MB/s). In the future I should figure out how to optimize that throughput, I am sure there is a tunable somewhere I missed.

Tying it together

For the base image I figured it would make the most sense to use Postgres since the pg_dump command will depend on the environment and mc is a binary. The Dockerfile looks like the following:

FROM minio/mc:RELEASE.2020-07-11T05-18-52Z as minio

FROM postgres:11.8
COPY --from=minio /usr/bin/mc /usr/bin/mc
RUN mkdir -p /app
WORKDIR /app
ADD backup.sh /app/backup.sh
CMD /app/backup.sh

With backup.sh looking like:

#!/bin/bash

set -xe
if [ "x$MINIO_TARGET" = "x" ]; then
  echo "No storage target set, refusing."
  exit -1
fi

export MC_HOST_minio="http://$MINIO_USER:$MINIO_SECRET@$MINIO_SERVER"

pg_dump \
  --clean --create \
  --encoding=UTF8 \
  --no-privileges --no-owner \
  --compress=9 \
  $PGDATABASE |  mc pipe "minio/$MINIO_TARGET"

Running the container with the following produces a newly named artifact. The $STORAGE_TARGET needs to be reworked to a base with a time stamp appended. However for now this works pretty well.

docker run -e PGPORT=2345 -e PGUSER=some-user \
  -e PGPASSWORD=password-some -e PGHOST=database-server -e PGDATABASE=database-nme \
  -e MINIO_TARGET=backups/database.sql.tgz  \
  -e MINIO_USER=minio -e MINIO_SECRET=minio123 \
  -e MINIO_SERVER=minio-service.platform-minio.svc.workshop.k8s:9000 \
   backup-postgres-minio
Integrating into Kubernetes
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: backup-database
spec:
  concurrencyPolicy: "Replace"
  schedule: "@daily"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: perform-backup
              image: docker.artifacts.internal/backup-postgres-minio:4
              env:
                - name: PGHOST
                  value: "pg-11.storage-postgres.svc.workshop.k8s"
                - name: PGDATABASE
                  value: "example-database"
                - name: PGPORT
                  value: "5432"
                - name: MINIO_SERVER
                  value: "minio-service.platform-minio.svc.workshop.k8s:9000"
                - name: MINIO_TARGET
                  value: "backups/backups/database.sql.tgz"
                - name: MINIO_USER
                  valueFrom:
                    secretKeyRef:
                      name: minio
                      key: user
                - name: MINIO_SECRET
                  valueFrom:
                    secretKeyRef:
                      name: minio
                      key: secret
                - name: PGPASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: database
                      key: PGPASSWORD
                - name: PGUSER
                  valueFrom:
                    secretKeyRef:
                      name: database
                      key: PGUSER
          restartPolicy: OnFailure
---
apiVersion: v1
kind: Secret
metadata:
  name: minio
type: Opaque
stringData:
  user: "minio"
  secret: "minio123"
Left TODO
  • Naming artifacts based on data of backup. Currently artifacts are just overwritten. Not bad but could definitely be better. This introduces artifacts which would need to be periodically deleted.
  • Verification of artifacts.
  • Almost everything is passed via environment variables. Some of these would probably be better with rotated credentials via Vault.