Dataiku in HA on EKS
• Mark Eschbach
Next up is to verify we can recover from loss of a single process and ensure our application works as expected. From
what I can tell the recovery story Dataiku uses is
tar
ing a directory. I am hoping the path forward is to use a persistent volume for that data directory.
A persistent volume is a administrator configured storage device which lives outside of the lifecycle of a pod. Under some platforms like GKE these can be auto-provisioned. For the EKS cluster I am working against this was already deployed, however I would imagine it is something like this:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "false"
name: persistent
parameters:
encrypted: "true"
type: gp2
zones: us-east-1b,us-east-1c
provisioner: kubernetes.io/aws-ebs
To utilize the volume you will need to define the volume.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: dataiku-orechestration
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
storageClassName: persistent
To apply this one would use:
spec:
containers:
- name: private-reg-container
volumeMounts:
- name: data
mountPath: "/home/dataiku/dss"
In theory this should work great. Unfortunately this does not work as the underlying container does not contain any
mount points. You can verify yourself with the docker command docker inspect dataiku/dss:latest
once you have run
dataiku/dss:latest
and look at the path .[0].ContainerConfig.Volumes
. At this point I think there are two options:
first is to ask Dataiku for guidance on how to run this or I think a StatefulSet would be an issue.
Speaking too soon
After running docker run -it --rm dataiku/dss /bin/sh
there is a directory named dss
at /home/dataiku
which is
where the data is being stored. Perhaps there is hope! Additionally, talking with a coworker, I should be able to
force a mount point at using something like this:
spec:
containers:
- name: dataiku-container
volumeMounts:
- mountPath: /home/dataiku/dss
name: data-store
readOnly: false
volumes:
- name: data-store
persistentVolumeClaim:
claimName: example-claim
I have the claim mounted, however Dataiku is producing the following error:
[-] Directory /home/dataiku/dss already exists, but is not empty. Aborting !
I will need to investigate another time.