Mark Eschbach

Software Developer && System Analyst

Hypothesis: Using Git for deployment will consume less bandwidth

For my static site I generate an archive of artifacts, including static pages, images, and other resources. This artifact is then uploaded and unpacked on target machines within a cluster. Often the changes between the previous version and the current version are insignificant, perhaps a few kilobytes of text changes. Other times entire parts of the target directory structure may be rewritten. The archive is produced by using tar against the generated artifacts, then piping tared contents through gzip. This produces an artifact in a reasonable amount of time, and includes a full copy of the website. The entire archive is approximately 472 kilobytes. For clairity I will call this method the Colonization method, named because it moves the remote server, displacing the previous inhabitants.

The deployment methodology I plan to use with Git is slightly different. I plan translate the website from the template format into the static production files on the build server. For each successful build the reuslting static files will be committed to a deployment repository. For each stage (alpha, beta, production) we wish to deploy to the build system will log into the remote sevice, push into the remote repository. To roll back we can manually log into the remote servers within the cluster and issue a manual rollback. I hypothesis this will reduce the overall bandwidth per deployment due to the compressed nature of the diffs sent by Git. I will call this the Federation method.

Colonization Method Data

The deployment scripts at the time of writing use three SSH connections. The first is uploading a compressed archive of the static content. Next is an SSH uploading a script for deploying the files within the archive to the production location. Within the third connection the deployment script is executed. For the purposes of gathering the data I will measure the bandwidth across the SSH connections. This should be multiplied across the number of hosts this exchanges occurs with.

To capture the bandwidth usage I will use Network Calipers, a tool which I built within Node.js for this purpose. This counts the bytes, for each connection and a total number of bytes for the life time for the application. SSH will be reconfigured to utilize localhost host on an alternative port, which will be reversed proxied to the actual host.

Deployment # Content Upload Script Upload Script Execution Total
In Bytes SentReceived SentReceived SentReceived SentReceived
All values are in bytes.
1 4888152263 28152071 24797735 49410912069
2 4892152263 28152071 24796519 49450910853
3 4891992263 28152071 24797815 49449312149

Federation Method Data Gathering

The SSH connections to the remote hosts for the script control will be monitored, along with the connection from the remote host pulling the code. Git will automatically update the working copy of the remote code base for us, so no additional connections are required. According to my hypothesis, this bandwidth consumed should be relative to the size of the changes. The size on disk could possibly be several factors greater than the source.

Deployment # Git
In Bytes SentReceived
All values are in bytes.
Deployment #1.a-c contains the git init, post-receive copy, and initial git push
1.a: git init 25912119
1.b: post-receive 28472071
1.c: git push 4774235175
1: Total 4828619365
2 41912295
3 40312663
4 35993271

Conclusion

The federation method is more efficient when there are small, incremental chagnes. The primary overhead in the federation method is the establishment of an SSH connection, which appears to consume approximately 2 kilobytes of bandwidth. Comparing this with the full deployment of 500K and counting there is considerable savings.

Federation Method Application

This method may be applicable to any source based production system, including static HTML, PHP, JavaScript, etc.

Deployment technique for removing the second connection