Over the last two weeks I’ve been using Influx. Just having a metrics store creates an enormous amount of value and my deployment has definitely shown that. Even just logging CPU temperatures. Although the program had locked up the baseboard of the system (too many queries of sensors at one per second) the system kept chugging along. Of more concern is InfluxDB locking up.

Influx would accept connections however would not respond to queries. The logs had nothing to show. The application just went catatonic. After some searching I came across the Github issues (should have checked there first). There are approximately 12 issues over the years with the Influx freezing.

This is a bit unfortunate as many of the issues are just closed with a statement along the lines of insufficient information. I had also saw that behavior from at Virta with our instances falling over. The normal line from the party responsible for operating that instance was “it needs more {CPU/memory}”. The Influx instance I am running has access to 24x2.4Ghz cores with 120GB of RAM…so I do not think that is the problem. This smells a bit like a locking issue to me however I have not been able to pinpoint what the trigger of the problem is.

As a result of Influx locking up I am a bit fearful I am continuing to invest in a technology which will not be able to reach my needs for availability. With Influx 2.0 it sounds like they are trying to replace other tools and might address the underlying issues. If I had the available time I would probably write my own time series storage against Postgres. Until then I will continue to hope for better stability from Influx and Influx 2.0 will continue to fit into my architecture.