For my personal projects I’ve been reallying wanting an object storage system. The system, much like S3, should be available on the network and hide many of the details of actual storage. This would be an important service for the Fog project to provide a permanent storage abstraction layer. Additionally it would allow for reasonable automation of secure backup & storage on other systems such as DropBox or Google’s drive. In the end this will be a huge project, however it’s always best to start simply.

Over the past few days I’ve been mulling over the right division of responsibilities within the system. Clients should be able to talk with two segments of the system: a coordinator and block storage. Objects will be a pair of names and a set of blocks. The coordinator will be the primary point of contact for clients of the cluster. The coordinator will be responsible for delegating clients to the block storage locations within the cluster.

An example request would be to write a file into the storage cluster. We value the contents of the file, so the file would need to be like this file backed up to a Dropbox account as well as being retained locally. The client will dial the coordinator and request the operations required to store the object. The coordinator will return which storage nodes to send the specific blocks to. The client would then contact all specified storage nodes and provide the data blocks. Upon completion, because we want to ensure this file is backed up we’ll query the coordinator to ensure the file as achieved the consistency we desire.

Coordinator

The coordinator has some interesting responsibilities: determining where blocks are to be stored, maintaining the metadata for each object stored, and maintaining a list of nodes within the system. Metadata will consist of a set of blocks which comprise the object, the total size, a key name, and additional storage attributes. Less interesting cross cutting concerns which will be the responsibility of the coordinator include authentication & authorization, discovery for whatever platform you want, and feeding any desired monitoring. The first draft will probably punt on many of these.

Storage Nodes

Each storage node has a simple set of responsibility: storage and retrieval of blocks within the system. Authentication and authorization will occur through the client facing coordinator. The easiest mechanism I know of off the top of my head is JWT and may be easily checked against the coordinating node.

First draft

THe first draft I’m calling Mud. Here in the Central Valley it feels like the fog raises out of the mud, an important repository of moisture. Also frustrating to dig in during the summer when the sun bakes it into a hard clay. If I use the project over a long period of time I’m sure I’ll create a clever acronym for it.

Just to keep things simple I’ll be starting with HTTP. Almost everything, including command line tools, speak HTTP. Probably need to add server certs shortly too, but that will come with deployment.