Next up is to verify we can recover from loss of a single process and ensure our application works as expected. From what I can tell the recovery story Dataiku uses is taring a directory. I am hoping the path forward is to use a persistent volume for that data directory.

A persistent volume is a administrator configured storage device which lives outside of the lifecycle of a pod. Under some platforms like GKE these can be auto-provisioned. For the EKS cluster I am working against this was already deployed, however I would imagine it is something like this:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "false"
  name: persistent
parameters:
  encrypted: "true"
  type: gp2
  zones: us-east-1b,us-east-1c
provisioner: kubernetes.io/aws-ebs

To utilize the volume you will need to define the volume.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: dataiku-orechestration 
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: persistent

To apply this one would use:

spec:
  containers:
  - name: private-reg-container
    volumeMounts:
    - name: data
      mountPath: "/home/dataiku/dss"

In theory this should work great. Unfortunately this does not work as the underlying container does not contain any mount points. You can verify yourself with the docker command docker inspect dataiku/dss:latest once you have run dataiku/dss:latest and look at the path .[0].ContainerConfig.Volumes. At this point I think there are two options: first is to ask Dataiku for guidance on how to run this or I think a StatefulSet would be an issue.

Speaking too soon

After running docker run -it --rm dataiku/dss /bin/sh there is a directory named dss at /home/dataiku which is where the data is being stored. Perhaps there is hope! Additionally, talking with a coworker, I should be able to force a mount point at using something like this:

spec:
  containers:
  - name: dataiku-container
    volumeMounts:
    - mountPath: /home/dataiku/dss
      name: data-store
      readOnly: false
  volumes:
  - name: data-store
    persistentVolumeClaim:
      claimName: example-claim

I have the claim mounted, however Dataiku is producing the following error:

[-] Directory /home/dataiku/dss already exists, but is not empty. Aborting !

I will need to investigate another time.