devork

E pur si muove

Python optimisation has surprising side effects

Tuesday, April 27, 2010

Here's something that surprised me:

a = None
def f():
    b = a
    return b
def g():
    b = a
    a = 'foo'
    return b

While f() is perfectly fine, g() raises an UnboundLocalError. This is because Python optimises access to local variables using the LOAD_FAST/STORE_FAST opcode, you can easily see why this is looking at the code objects of those functions:

>>> f.__code__ .co_names
()
>>> f.__code__ .co_varnames 
('a', 'b')
>>> g.__code__ .co_names
('a',)
>>> g.__code__ .co_varnames 
('b',)

I actually found out this difference thanks to finally watching the Optimizations And Micro-Optimizations In CPython talk by Larry Hastings from PyCon 2010. I never realised that you could create a situation where the nonlocal scope would not be looked in.

Judging performance of python code

Saturday, April 03, 2010

Recently I"ve been messing around with python bytecode, as a result I now know approximately how the python virtual virtual machine works at a bytecode level. I've found this quite interesting but other then satisfying my own curiosity had no benefit from this knowledge. Until today I was wondering which code would be more efficient:

a += 1
if b is not None:
    a += 2
if b is None:
    a += 1
else:
    a += 3

From a style point of view I'd prefer to write the first: I find it slightly more readable since it's got fewer indented blocks and it uses one line less on my screen. But my gut feeling tells me the second is faster, particularly if b is not None (this because in the first sample we'd do 2 add operations instead of one if b is not None). But now I can verify my gut feeling! All I need to do is compile both fragments and use the disassembler to investigate:

>>> co1 = compile(sample1, '', 'exec')
>>> dis.dis(co1)
  2           0 LOAD_NAME                0 (a) 
              3 LOAD_CONST               0 (1) 
              6 INPLACE_ADD          
              7 STORE_NAME               0 (a) 

  3          10 LOAD_NAME                1 (b) 
             13 LOAD_CONST               2 (None) 
             16 COMPARE_OP               9 (is not) 
             19 POP_JUMP_IF_FALSE       35 

  4          22 LOAD_NAME                0 (a) 
             25 LOAD_CONST               1 (2) 
             28 INPLACE_ADD          
             29 STORE_NAME               0 (a) 
             32 JUMP_FORWARD             0 (to 35) 
        >>   35 LOAD_CONST               2 (None) 
             38 RETURN_VALUE
>>> co2 = compile(sample2, '', 'exec')
>>> dis.dis(co2)
  2           0 LOAD_NAME                0 (b) 
              3 LOAD_CONST               2 (None) 
              6 COMPARE_OP               8 (is) 
              9 POP_JUMP_IF_FALSE       25 

  3          12 LOAD_NAME                2 (a) 
             15 LOAD_CONST               0 (1) 
             18 INPLACE_ADD          
             19 STORE_NAME               2 (a) 
             22 JUMP_FORWARD            10 (to 35) 

  5     >>   25 LOAD_NAME                2 (a) 
             28 LOAD_CONST               1 (3) 
             31 INPLACE_ADD          
             32 STORE_NAME               2 (a) 
        >>   35 LOAD_CONST               2 (None) 
             38 RETURN_VALUE

My analysis of this is pretty simple: count the number of instructions for when b is None and for when b is not None.

b is Noneb is not None
sample11015
sample21110

So ultimately the best performance depends on whether b will be None or not. However the difference in the best case is only one instruction, but the difference in the worst case is a whole mighty 4 instructions! This would seem to confirm my gut feeling: the ugly code is better. It also makes me wonder if this is not the sort of optimisation a compiler should be doing: create the bytecode for sample2 regardless of the source code (I'm not a compiler guy and do realise it might not be as simple as that since python is a dynamic language which allows you to change stuff, including executed code, at runtime, yada yada).

There's one more catch tough: I seriously doubt each python instruction can be executed in the same time! So let's use timeit to actually verify this. I'm omitting the trivial code, but this is the result:

$ python3 test.py 
sample1, b is None: 0.128307104111
sample2, b is None: 0.128338098526
sample1, b is not None: 0.244062900543
sample2, b is not None: 0.12109208107

To be honest, the result is exactly as speculated: the first sample is slower when b is not None, all others are pretty much the same. The one odd thing is the pretty much doubling of time for the bad case, this suggest the python virtual machine is spending most of the time doing the INPLACE_ADD instruction while all others are probably very quick.

Anyway, in conclusion I guess you can speculate about performance and get an idea by knowing what bytecode will be generated. But at the end of the day you'll get a simple and better answer by simply using timeit. So knowing something about python bytecode still hasn't gained me any benefit. It was still an interesting exercise tough.

Hacking mock: Mock.assert_api(...)

Thursday, April 01, 2010

Mock is a great module to use in testing, I use it pretty much all the time. But one thing I have nerver felt great about is the syntax of it's call_args (and call_args_list): it is a 2-tuple of the positional arguments and the keyword arguments, e.g. (('arg1', 'arg2'), {'kw1': None, 'kw2': None}). This does show you exactly how the mock object was called, but the problem I have is that it's more restrictive then the signature in python:

def func(foo, bar, baz=None):
    pass

func(0, 1, 2)
func(0, 1, baz=2)
func(0, bar=1, baz=2)
func(foo=0, bar=1, baz=2)

In this example all the calls to func() are exactly the same from python's point of view. But they will all be different in mock's call_args attribute. To me this means my test will be too tightly coupled to the exact implementation of the code under test. This made me wonder how it could be better.

Firstly I started out writing a function which would take the call_args tuples and know the function signature. This obviously isn't very nice as you need a new function for each signature. But this lead me on to adding the .assert_api() method to the mock object itself. I've been using this method for a while and am still not disappointed in it, so tought I should write about it. Here's how to use it:

mobj = mock.Mock(api='foo,bar,baz')
do_stuff()
mobj.assert_api(foo=0, bar=1, baz=2)

It seems to me this is a fairly good compromise to an extra method on the mock object (and attribute, not shown) and nice concise way of asserting if a function was called correctly.

There are a number of side effects, mainly due to my implementation. The major one is that it consumes .call_args_list! This means each time you call .assert_api() the first item disappears from .call_args_list. Again, I like this as it allows me easily to check multiple calls by just doing multiple .assert_api() calls.

Another side effect, of less importance, is a new attribute "api" on the mock object, I can imagine people disliking that one. But it's never been in my way and means you can just assign to it instead of using the keyword argument when creating the mock. I find this handy in combination with patch decorators where the syntax to fine-tune the mock is rather heavy to my liking.

The last strange thing is .assert_api_lazy(), which is just a horribly bad name. It ignores any arguments which are present in the call on the mock object, but where not passed in as part of the api='...' parameter. It's effectively saying you only want to check a few of the arguments and don't care about the others.

Finally here is the code, for simplicity here implemented (and unchecked) as a subclass of Mock:

class MyMock(Mock):
    def __init__(api=None, **kwargs):
        Mock.__init__(self, **kwargs)
        self.api = api

    def _assert_api_base(self, **kwargs):
        """Return call_kwargs dict for assert_api and assert_api_lazy

        WARNING, this consumes self.call-args_list
        """
        if self.call_args_list:
            call_args, call_kwargs = self.call_args_list.pop(0)
        else:
            raise AssertionError('No call_args left over')
        call_args = list(call_args)
        if self.api is None:
            raise AssertionError('self.api is not initialised')
        for p in self.api.split(','):
            if call_args:
                call_kwargs[p] = call_args.pop(0)
        return call_kwargs

    def assert_api(self, **kwargs):
        """WARNING, this consumes self.call_args_list"""
        call_kwargs = self._assert_api_base(**kwargs)
        assert kwargs == call_kwargs, \
            'Expected: %s\nCalled with: %s' % (kwargs, call_kwargs)

    def assert_api_lazy(self, **kwargs):
        """WARNING, this consumes self.call_args_list"""
        call_kwargs = self._assert_api_base(**kwargs)
        for k in call_kwargs.copy().iterkeys():
            if k not in kwargs:
                del call_kwargs[k]
        assert kwargs == call_kwargs, \
            'Expected: %s\nCalled with (truncated): %s' % (kwargs, call_kwargs)

I'd be interested in feedback! It's definitely not perfect, but I do like the syntax of m=Mock(api='...'); ...; m.assert_api(...). One problem I can think of is that it won't deal gracefully with default arguments on the api yet, e..g.:

def func(foo, bar=42):
    pass

func(foo, 42)
func(foo)

These two calls are identical, but .assert_api() won't see them as the same. This hasn't bothered me yet, which is why I haven't looked into it. But I guess it should be considered for a general-purpose implementation of this idea.

Subscribe to: Posts (Atom)