E pur si muove

A Container is an Erlang Process

Monday, August 15, 2016

This post is a response to A Container Is A Function Call by Glyph. It is a good article and worth your time reading, and you might want to read it to follow here. On twitter I asserted the article recommends building a monolith while Glyph countered "On the contrary, explicit interfaces are what makes loose coupling possible". Fair enough, but twitter is a bit awkward to respond, so I'm attempting to write my thoughts down here.

In particular the suggestion that the infrastructure, whether that is Docker Compose or as I would recommend Kubernetes or even something else, should refuse to run a container unless all it's dependencies are available:

An image thusly built would refuse to run unless:

  • Somewhere else on its network, there was an etcd host/port known to it, its host and port supplied via environment variables.
  • Somewhere else on its network, there was a postgres host, listening on port 5432, with a name-resolution entry of “pgwritemaster.internal”.
  • An environment variable for the etcd configuration was supplied
  • A writable volume for /logs was supplied, owned by user-ID 4321 where it could write common log format logs.

The suggestion here is that the service, err container, would just crash if any of these where not available. However when you're building your service it should expect network failure as well as failure of other services, that is the nature of distributed systems. Dependencies might not always be there and your service should do the most sensible thing in that case. In fact systems like Kubernetes have a nice service concept which is a fixed (DNS) endpoint available in the cluster which gets dynamically routed to any container running which happens to have the correct tags associated with it. This emphasises that whatever provides this service might come and go while often even multiple containers can provide it.

I compare a container with an Erlang process because I think this is how they should behave. They should be managed by a process supervisor, Kubernetes or whichever is your poison, and they should communicate using an asynchronous communication protocol based on message passing and not (remote) function calls. If they don't do this you're building a tightly coupled system which is like a monolith but with added network failures between your function calls.

Obviously in the real world you're stuck with things like the Postres protocol and this is ok. Sometimes your own service is also going to need a protocol which will need to explicitly respond. But the key thing is that as a user of such a service you expect failure, you expect it not to be there and do the best you can for your own users, even if that is just returning an error code. If you do this your process supervisor, err container/cluster infrastructure, can happily normalise the state of your services again by bringing up the missing service without a huge cascade in failures grinding your entire cluster to a halt. This is the opposite of the infrastructure refusing to run your container because a service which it uses is missing.

Shameless plug: I also spoke about this at EuroPython.

py.test sprint in Freiburg

Saturday, February 20, 2016

Testing is a really important part of Python development and picking the testing tool of choice is no light decision. Quite a few years ago I eventually decided py.test would be the best tool for this, a choice I have never regretted but has rather been re-enforced ever since. Py.test has been the testing tool that seamlessly scaled from small unit tests to large integration test. Furthermore it has seen a steady and continuous development over all these years, the py.test I first used was without a lot of the features we now consider essential: AST re-writing, fixtures and even the plugin system did not exist yet. To have seen all this work by so many people put into the tool has been great. And at some point I myself moved from user to contributing plugins and eventually doing various bits of work on the core as well.

Personally the greatest part of this all has been seeing the project grow from (mostly) a single maintainer to the team that maintains py.test now while at the same time the adoption among users has steadily kept growing as well. Py.test is now in a position where any of about half a dozen people can make a release and many plugin maintainers have now also joined the pytest-dev team. Since the team has grown in the last few years some of us have managed to meet up at various conferences. Yet, due to the range of continents we never all managed to meet. This is how the idea of a dedicated sprint for py.test first came about, would it not be great if we all managed to meet and spend some dedicated time to work on py.test together?

With this objective we have now organised a week-long sprint and created a fundraiser campaign to help us make it affordable for even those of us coming from far-flung continents (depending on your point of view!). It would be great to get your or your company's support if you think py.test is a worthwhile tool for you. The sprint is open to anyone, so if you or your company think it would be interesting for you to learn a lot about py.test while helping out or maybe working on your pet feature or bug, please come along! Just drop us a note on the mailing list and we'll accommodate for you.

There is a variety of topics people looking at working on, all together hopefully culminating in a py.test 3.0 release (which will be backwards compatible!). Personally I would like to work on a feature to elegantly fail tests from within finalisers. The problem here is that raising an exception in a finaliser is actually treated as an error, but yet this is a fairly common feature that fixtures often do this anyway. My current plan is to add a new request.addverifier() method which would be allowed to fail the test, though exact details may change. Another subject I might be interested in is adding multiple-environment support to tox, so that you may be able to test packages in e.g. a Conda environment. Though this is certainly not a simple feature.

So if you use py.test and would like to support us it would be great if you contributed or maybe convince your work to contribute. And if you're keen enough to join us for the sprint that would be great too. I look forward to meeting everyone in June!

Pylint and dynamically populated packages

Thursday, December 04, 2014

Python links the module namespace directly to the layout of the source locations on the filesystem. And this is mostly fine, certainly for applications. For libraries sometimes one might want to control the toplevel namespace or API more tightly. This also is mostly fine as one can just use private modules inside a package and import the relevant objects into the file, optionally even setting __all__. As I said, this is mostly fine, if sometimes a bit ugly.

However sometimes you have a library which may be loading a particular backend or platforms support at runtime. An example of this is the Python zmq package. The apipkg module is also a very nice way of controlling your toplevel namespace more flexibly. Problem is once you start using one of these things Pylint no longer knows which objects your package provides in it's namespace and will issue warnings about using non-existing things.

Turns out it is not too hard to write a plugin for Pylint which takes care of this. One just has to build the right AST nodes in place where they would be appearing at runtime. Luckily the tools to do this easily are provided:

def transform(mod):
    if == 'zmq':
        module = importlib.import_module(
        for name, obj in vars(module).copy().items():
            if (name in mod.locals or
                    not hasattr(obj, '__module__') or
                    not hasattr(obj, '__name__')):
            if isinstance(obj, types.ModuleType):
                ast_node = [astroid.MANAGER.ast_from_module(obj)]
                if hasattr(astroid.MANAGER, 'extension_package_whitelist'):
                real_mod = astroid.MANAGER.ast_from_module_name(obj.__module__)
                ast_node = real_mod.getattr(obj.__name__)
                for node in ast_node:
            mod.locals[name] = ast_node

As you can see the hard work of knowing what AST nodes to generate is all done in the astroid.MANAGER.ast_from_module() and astroid.MANAGER.ast_from_module_name() calls. All that is left to do is add these new AST nodes to the module's globals/locals (they are the same thing for a module).

You may also notice the fix_linenos() call. This is a small helper needed when running on Python 3 and importing C modules (like for zmq). The reason is that Pylint tries to sort by line numbers, but for C code they are None and in Python 2 None and an integer can be happily compared but in Python 3 that is no longer the case. So this small helper simply sets all unknown line numbers to 0:

def fix_linenos(node):
    if node.fromlineno is None:
        node.fromlineno = 0
    for child in node.get_children():

Lastly when writing this into a plugin for Pylint you'll want to register the transformation you just wrote:

def register(linter):
    astroid.MANAGER.register_transform(astroid.Module, transform)

And that's all that's needed to make Pylint work fine with dynamically populated package namespaces. I've tried this on zmq as well as on a package using apipkg and its seems to work fine on both Python 2 and Python 3. Writing Pylint plugins seems not too hard!

New pytest-timeout release

Thursday, August 07, 2014

At long last I have updated my pytest-timeout plugin. pytest-timeout is a plugin to py.test which will interrupt tests which are taking longer then a set time and dump the stack traces of all threads. This was initially developed in order to debug some some tests which would occasionally hang on a CI server and can be used in a variety of similar situations where getting some output is more useful then getting a clean testrun.

The main new feature of this release is that the plugin now finally works nicely with the --pdb option from py.test. When using this option the timeout plugin will now no longer interrupt the interactive pdb session after the given timeout.

Secondly this release fixes an important bug which meant that a timeout in the finaliser of a fixture at the end of the session would not be caught by the plugin. This was mainly because pytest-timeout was not updated since py.test changed the way fixtures where cached on their scope, the introduction of @pytest.fixture(scope='...'), even though this was a long time ago.

So if you use py.test and a CI server I suggest now is as good a time as any to configure it to use pytest-timeout, using a fairly large timeout of say 300 seconds, then forget about it forever. Until maybe one day it will suddenly save you a lot of head scratching and time.

Designing binary/text APIs in a polygot py2/py3 world

Sunday, April 27, 2014

The general advice for handling text in an application is to use a so called unicode sandwich: that is decode bytes to unicode (text) as soon as receiving it, have everything internally handle unicode and then right at the boundary encode it back to bytes. Typically the boundaries where the decoding and encoding happens is when reading from or writing to files, when sending data across the network etc. So far so good.

All this is fine in an environment where it is possible to know the encoding to be used and where an encoding failure can simply be treated as a hard failure. However POSIX is notoriously bad at this, for many things the kernel just doesn't care and any bytes which go in will come back out. This means that for e.g. a filename or command line arguments the kernel does not care about it being valid in the current locale/encoding or even any encoding. When Python 3.0 was initially released this was a problem and by Python 3.1 the solution used was to introduce the surrogateescape error handler for decoders and encoders. This allows Python 3 to smuggle un-decodable bytes in unicode strings and the encoder will put them back when round-tripping. The classical example of why this is useful is when listing files using e.g. os.listdir() to then later pass them back to the kernel via e.g. open().

The downside of surrogate escapes is that the unicode strings now are no longer valid for many other normal string manipulations. If you try to write the result of os.listdir() to a file which you want to encode using UTF8 the encoding step will blow up, so this kind of brings the old Python 2 situation with bytes back. So any user of the API needs to be aware that strings may contain surrogate escapes and handle them appropriately. For a detailed description of these cases refer to Armin Ronacher's Unicode guide which introduces is_surrogate_escaped(s) and remove_surrogate_escaping(s, method='ignore') functions which are pretty self-explanatory.

But let's for now accept the surrogate escape solution Python 3 introduces, as long as the API documents this a user can handle it with the earlier mentioned helper functions. However when designing a polygot library API it is impossible to use the surrogateescape error handler since it does not exist for Python 2.7. And since the required groundwork was not backported either it is impossible to write a surrogateescape handler for Python 2.7, which I consider a glaring omission certainly given the timeline. So this pretty much makes the surrogateescape option not viable as a 2.7/3.x API.

So what options are there left for an API designer? One suggestion is to use native strings: bytes on Python 2.7 and unicode with surrogateescapes on Python 3.x. This means in either case there is no loss of data. But it also means the user of the API now has a harder time writing polygot code if they want to use the unicode sandwich. Given the difficulties to the user I'm not sure I'm a fan of this API.

Another correct, but rather unfriendly, option is to just consider the API to expose bytes and provide the encoding which should be used to decode it. In this case the user can choose the appropriate error handler themselves, be it =ignore=, =replace= or, on Python 3, surrogateescape. The advantage is that this would behave exactly the same on Python 2 and Python 3, however it leaves a casual user of the API a bit lost, certainly on Python 3 where receiving bytes of the API is not very friendly and feels like pushing the Python 2 problems back onto them.

Yet another option I've been considering is provide both APIs: one exposing the bytes, with the attributes possibly prefixed with a b, and one convenience API which decoded the bytes to unicode using the =ignore= error handler. This really seems to pollute the API but might still be the most pragmatic solution: it behaves the same on both Python 2 and Python 3, does not lose any information, allows easy use of the all-unicode inside text model yet still allows explicit handling of decoding.

So what is the best way to design a polygot API? I would really like to hear peoples opinions on which API would be the nicest to use. Or hear if there are any other tricks to employ for polygot APIs.

Subscribe to: Posts (Atom)