Stream of conciousness - Mark Eschbach (Software Developer && System Analyst)

I will admit my home servers are not well instrumented. None of the services report statistics. I am completely blind within that realm. So this is my search for a way to record these data points and analyze them.

Implementation requirements

Going through the pipeline there are several requirements. Many of the application I have deployed at home are written in either NodeJS or Ruby. For home applications I am concerned about productivity of production, not ideological purity of implementation; if I was they would probably all be implemented in Erlang. The applications should be able to offload arbitrary metrics into a data store. The monitoring system should also be able to pickup metrics form the host systems and aggregate them.

I do not have strong feelings on ways to visualize or alert on the data, however this should be fairly simple. For the initial implementation I would like to use Grafana since it is the suite du jour in the industry. Grafana’s architecture is built to consume multiple data sources, even for a single graph. Unfortunately this does not make the choice set smaller for the storage aspects.

Considering InfluxDB

There has been a lot of buzz surrounding InfluxDB and was implemented as a default metrics store for application metrics at work. InfluxDB has worked well however I do not have experience in an operational context. The deployment scenario seems fairly simple with a Docker or a simple package to install on the host. For now I will consider Docker for simplicity.

Backup and restore scenarios are fairly simple. According to the InfluxDB official documentation one just runs a command for either scenario. Looks like clustering is only available for commercial support which is a bit disappointing however not a deal breaker. High availability creates complex use cases anyway.

After a bit of searching I was unable to locate any information regarding limits of the number of rows or of data store sizes for InfluxDB. I tend to be a bit neurotic about retaining details and logs for a while. The retention policies appear to be configurable at two levels: shard groups and databases. A database is the recommended level for each retention policy. The documentation for CREATE DATABASE does not provide a great description on how one would disable automated destruction of data. CREATE RETENTION POLICY provides a possible value of INF which is promising.

Clients

As for clients there are recommended bindings for each environment. For Node there is node-influx. For Ruby there is an official Gem produced by the commercial organization behind the project.

Node

node-influx has a fairly simple API. Connecting to the database is straight forward, however according to the tutorial one should provide the schema you would write with. The actual API doc examples appear to hold a different opinion. The other options are fairly straight forward for user name, shared secret, and network coordinates.

Connections are handled out of band from the constructor. Underneath the hood connections are pooled and the client will intelligently handle errors with back-offs. Written data points will promise a result using the writePoints method which is pretty straight forward. Querying unfortunately does not provide an example of the resulting data structures but is irrelevant as a data sink.

Ruby

Ruby’s client may operate in two modes: asynchronous and synchronous. In sync mode the data points are blocking calls. In async they will enqueue the outgoing values to be written and optionally block if the queue is full. Connections are fairly straight forward just requiring network coordinates. Operational mode defaults to synchronous however the asynchronous options may be passed. By default values are written in seconds and it appears like there is a dance required to get sub-second times.

Conclusion

So it looks like overall using InfluxDB for storage and aggregation of values and Grafana as an analysis layer. I have some reservations regarding the default retention policy. It’s also reasonable to remove the data at some point I suppose. Initial implementations look reasonable to proceed with though.

From an architectural perspective I would love to build a service which aggregates the data to various sinks. This would resolve concerns surrounding data retention and remove the need to push out details to various applications. This could easily be setup as a WebSocket service or straight HTTP(s).

Mark Eschbach

Intranet Computing: A metrics monitoring system

Nov 17, 2018 • Mark Eschbach

Tags: