E pur si muove

A Container is an Erlang Process

Monday, August 15, 2016

This post is a response to A Container Is A Function Call by Glyph. It is a good article and worth your time reading, and you might want to read it to follow here. On twitter I asserted the article recommends building a monolith while Glyph countered "On the contrary, explicit interfaces are what makes loose coupling possible". Fair enough, but twitter is a bit awkward to respond, so I'm attempting to write my thoughts down here.

In particular the suggestion that the infrastructure, whether that is Docker Compose or as I would recommend Kubernetes or even something else, should refuse to run a container unless all it's dependencies are available:

An image thusly built would refuse to run unless:

  • Somewhere else on its network, there was an etcd host/port known to it, its host and port supplied via environment variables.
  • Somewhere else on its network, there was a postgres host, listening on port 5432, with a name-resolution entry of “pgwritemaster.internal”.
  • An environment variable for the etcd configuration was supplied
  • A writable volume for /logs was supplied, owned by user-ID 4321 where it could write common log format logs.

The suggestion here is that the service, err container, would just crash if any of these where not available. However when you're building your service it should expect network failure as well as failure of other services, that is the nature of distributed systems. Dependencies might not always be there and your service should do the most sensible thing in that case. In fact systems like Kubernetes have a nice service concept which is a fixed (DNS) endpoint available in the cluster which gets dynamically routed to any container running which happens to have the correct tags associated with it. This emphasises that whatever provides this service might come and go while often even multiple containers can provide it.

I compare a container with an Erlang process because I think this is how they should behave. They should be managed by a process supervisor, Kubernetes or whichever is your poison, and they should communicate using an asynchronous communication protocol based on message passing and not (remote) function calls. If they don't do this you're building a tightly coupled system which is like a monolith but with added network failures between your function calls.

Obviously in the real world you're stuck with things like the Postres protocol and this is ok. Sometimes your own service is also going to need a protocol which will need to explicitly respond. But the key thing is that as a user of such a service you expect failure, you expect it not to be there and do the best you can for your own users, even if that is just returning an error code. If you do this your process supervisor, err container/cluster infrastructure, can happily normalise the state of your services again by bringing up the missing service without a huge cascade in failures grinding your entire cluster to a halt. This is the opposite of the infrastructure refusing to run your container because a service which it uses is missing.

Shameless plug: I also spoke about this at EuroPython.

Subscribe to: Posts (Atom)