E pur si muove

Finding memory leaks in Python extension modules

Saturday, December 13, 2008

Valgrind is amazing, even if you've never used it before it's basic usage is really simple:

$ valgrind --tool=memcheck --leak-check=full /usr/bin/python ./

This is enough to point you at the places in your extension module where you allocated stuff that didn't get freed later. This is a massive timesaver with looking over the entire source file again to find out where you made your mistakes.

I must admit that the extension module in question uses malloc(3) and free(3) directly instead of allocating on the Python heap using PyMem_Malloc() and PyMem_Free(), so I don't know if that would make it harder to find the leaks. I can imagine that in that case the "blocks definitely lost" list might point to somewhere in Python's source files instead of your own source files, but I don't know.

gcc, LD_RUN_PATH & Solaris

Thursday, December 11, 2008

I actually started writing a long post about how linking agains shared libraries works and what and how does, but that seemed to get very long. So here the short version that assumes pre-existing knowledge.

If you want an RPATH in your DSOs (dynamic shared objects: executables and shared libraries) you can pass in -R/-rpath to the linker (the runtime_library_dir keyword argument to distutils's Extension class). This is not always very practical though, e.g. when building Python it would be a major pain to modify all the makefiles and Error prone too.

So the linkers (both GNU and Solaris) also accept an environment variable: LD_RUN_PATH. When that is set it is used to populate the RPATH field, but only when no explicit -R/-rpath is specified. So far so good.

On Solaris however your gcc will usually not be installed in the root system, rather in /usr/sfw (sun supplied) or /opt/csw/gccX (opencsw). So the gcc libs will also not be in the default library search path. But gcc is nice and helps you out, it will implicitly add a -R option to ld pointing to it's own library directory (for libgcc). Now I'm not sure how nice exactly this is since it screws your LD_RUN_PATH environment variable over and you actually need to run gcc traced to see this happening, have fun finding that! It would be nice if gcc would extend the environment variable if you had it set but where using no -R flags instead. Oh well, at least now you know.

Mocking away the .__init__() method

Saturday, December 06, 2008

I wanted to test a method on a class that didn't really depend on a lot of other stuff off the class. Or rather what it did depend on I had already turned into Mock objects with appropriate .return_values and asserting with .called etc. Problem was that the .__init__() method of the object invoked about half the application framework (option parsing, configuration file loading, setting up logging etc.). Firstly I don't really feel like testing all of that (hey, these are called unittests after all and those fuctionalities have their own!) and secondly then I had to worry about way too much, quite a lot to setup.

That's how the whacky idea of replacing the .__init__() method with a mock occurred to me:

class TestSomething(object):
    def test_method(self):
        inst = module.Klass()
        inst.other_method = mock.Mock()
        inst._Klass__log = mock.Mock()
        # more of this
        assert inst.other_method.called

The ugly side effects of deciding to mock away the .__init__() method like this is that I have to create mocks for more internal stuff. The one shown for exampls is normally provided by self.__log = logging.getLogger('foo').

I must admit that I'm still trying to find my way in how to use the mock module effectively and hence I'm not really sure how sane this approach is. One of my objections with this is that not only am I meddling with clearly hidden attributes of the class, but I also have to do this again and again for each test method. So the next revision (I'm using py.test as testing framework here btw):

class TestSomething(object):
    def setup_class(cls):
        cls._original_init_method = module.Klass.__init__
        module.Klass.__init__ = mock.Mock(return_value=None)

    def teardown_class(cls):
        module.Klass.__init__ = cls._original_init_method

    def setup_method(self, method):
        self.inst = module.Klass()
        self.inst._Klass__log = mock.Mock()
        # more of this

    def test_method(self):
        self.inst.other_method = mock.Mock()
        assert self.inst.other_method.called

This is actually workable and I'm testing what I want to test in a pretty isolated way. I'm still wondering whether I've gone insane or not tough. Is it reasonable to replace .__init__() by mock objects? Have other people done this?

Preserving modification times in subversion

Wednesday, November 26, 2008

This is an oft requested feature apparently. But who know if and when it will be there. Usually things get done when someone has an itch to scratch, but given that no one has done this yet -but it's been requested for ages- it seems that it must either be really difficult or work arounds are easier. Turns out that in my case a work around is a lot easier.

I need to build various bits of software on various platforms and they can have local modifications. In the case of Python for example we strip out the bit where it looks up the modules search path in the registry on Windows (instead of clobbering a hopefully unique string in the resulting binary by a hopefully non-existing registry key like py2exe does IIRC - I'm surprised that hasn't blow up in anyones face yet but nevermind). Anyway, the point of this is that the natural format to store all of this in was in unpacked form in the revision control system, subversion in our case, so we can diff and merge as we like. Problem with this is that when we check out the source on a build host all timestamps are lost and for some projects this messes up the makefile logic and things break down horribly and randomly on the build hosts. I got tired of finding this out every time and adding yet another random touch to the build scripts so was looking for something better.

Turns out that this is actually suprisingly easy to do. I just store the timestamps of the unpacked tarball in a file, and call touch on all files first thing in our build script. These shell functions are not bomb-proof yet (it won't cope with spaces in filenames e.g.) but do the job for now:

create() {
    if [ -f .mtimes ]; then
        echo "E: .mtimes exists already!" >&2
        exit 1

    for file in $(find . -print); do
        mtime=$(stat --format=%y "$file")
        echo $file $mtime >> .mtimes

restore() {
    if [ ! -f .mtimes ]; then
        echo "E: No .mtimes file found!" >&2
        exit 1

    while read file mtime; do
        # This would be the simple GNU option, POSIX however...
        #touch --date="$mtime" $file
        CCYY=$(echo $mtime | cut -d- -f1)
        MM=$(echo $mtime | cut -d- -f2)
        DD=$(echo $mtime | cut -d- -f3 | cut -d' ' -f1)
        hh=$(echo $mtime | cut -d' ' -f2 | cut -d: -f1)
        mm=$(echo $mtime | cut -d' ' -f2 | cut -d: -f2)
        SS=$(echo $mtime | cut -d' ' -f2 | cut -d: -f3 | cut -d. -f1)
        touch -t $CCYY$MM$DD$hh$mm.$SS $file
    done < .mtimes

Note that stat is a GNU tool in the coreutils package, so create() will only work on a GNU-based system. restore() however should work on all POSIX compliant systems (and so far it seems to do so).

Only thing is that touching each and every file is a rather slow process... And before you ask, I actually do the timestamp conversion at creation time as that happens less often, but this seems clearer.

Generative Tests

Tuesday, September 16, 2008

Both nose and py.test allow you to write generative tests. These are basically generators that yield the actual test functions and their arguments. The test runners will then test all the yielded functions with their arguments.

However I'm left wondering why this is better then just iterating in the test function and doing many asserts? Let me demonstrate with the py.test example:

def test_generative():
    for x in (42,17,49):
        yield check, x

def check(arg):
    assert arg % 7 == 0   # second generated tests fails!

Why is that code better then:

def test_something():
    for x in (42,17,49):
        assert x % 7 == 0

I know that it could possibly tell you slightly more, but in general I keep testing until all of my tests succeed so I don't really mind if one test includes more then one assert statement. I'll just keep fixing things till the test passes.

What is the motivation for generative tests?

As an aside (and the inspiration for this post thanks to Holger Krekel talking about py.test), PyCon UK was great. I was considering a post titled "Resolver Systems Ltd is Evil since they provided the Sunday morning hangovers, but thought that was a bit too sensationalist.

Solaris equivalent of dpkg -S

Tuesday, August 19, 2008

In the past I have spent quite some time trying to figure out what package did install a certain file on Solaris. What I wanted was really just the equivalent of dpkg -S /path/to/file. But those searches where not very fruitfull, even talking to more experienced Solaris admins didn't help, they could only tell me how to find out what files where in a certain package.

Today however, mostly by accident (I was trying to remember how to list the files in a package), I finally found it!

neptune:~# pkgchk -l -p /usr/local/bin/wget 
Pathname: /usr/local/bin/wget
Type: regular file
Expected mode: 0755
Expected owner: bin
Expected group: bin
Expected file size (bytes): 392572
Expected sum(1) of contents: 32029
Expected last modification: Nov 25 10:57:55 2006
Referenced by the following packages:
Current status: installed



Bluetooth on a Toshiba Tecra A9-127

Saturday, June 14, 2008

I got a Toshiba Tecra A9-127, listening to the model number of PTS52E-06002LEN as written on the back, from work to replace my dying old HP Compaq nx9030. As my laptop OS of choice is Ubuntu, currently in it's Hardy release, it's not completely coincidence that the hardware is almost all Intel based since that's what Matthew Garrett recommends. And indeed, it all works effortlessly! Apart from bluetooth.

For bluetooth you need a tool called toshset. Once you have that you can enable the internal bluetooth device:

flub@signy:~$ sudo toshset -bluetooth on
bluetooth: attached

And all of a sudden you'll have an hci device, probably hci0, check it with hciconfig -a if you fancy. Magic! It's just like plugging in a USB dongle...

Only toshset is not available for the amd64 flavour of hardy, only in the i386 version. No panic though, Debian has an up to date package (Ubuntu intrepid also has the right 1.73 version but doesn't build it for amd64 yet - bug filled).

$ dget
$ dpkg-source -x toshset_1.73-2.dsc

Don't quite rejoice yet, Debian seems to have changed from the pciutils-dev package to libpci-dev. So go and edit debian/control to build depend on pciutils-dev again. Then just build the package, install and enjoy.

Hopefully someone will now spend less time then me figuring this out...

Ripping videos from DVD

Saturday, May 10, 2008

Physical discs are a nuisance, I really just want to play what I want to watch in the room I want to watch it just streaming it over the wireless. This actually works wonderfully well. Unfortunately just copying the file structure of a DVD works and gives you DVD-quality video, but the size is huge and streaming this over the wireless tends to create some trouble (not to mention that every byte must also be encrypted/decrypted for ssh so this is getting CPU intensive too, ssh offloading onto hardware would be so cool). Hence the need to "rip" the DVD and encode it to some smaller format arises.

After spending a while looking at various options I finally found the great thoggen.

Only problem left is how much to compress. After some very un-scientific tests (Google failed to find me any nice studies/graphs!) I decided on a quality of 35 and no resizing. But if anyone knows of a better study on what settings to prefer it would be greatly appreciated! It would be great to see quality vs file size vs frame size for different types of video. Ideally with a subjective "human quality" level too so you'd know what still looks good.

Time to read standards

Thursday, April 17, 2008

Sometimes I like quotes out of context...

Anyway, I don't think it really is an ambiguity in practice -- only in the minds of those that have too much time to read standards documents.
-- Greg Ewing (on distutils-sig)

@x.setter syntax in Python 2.5

Friday, April 11, 2008

The new property.setter syntax in Python 2.6 and 3.0 looks very clean to me. So I couldn't wait and wanted it in Python 2.5. With the help of comp.lang.python I finally got there (thanks especially to Arnaud Delobelle):

_property = property

class property(property):
    def __init__(self, fget, *args, **kwargs):
        self.__doc__ = fget.__doc__
        super(property, self).__init__(fget, *args, **kwargs)

    def setter(self, fset):
        cls_ns = sys._getframe(1).f_locals
        for k, v in cls_ns.iteritems():
            if v == self:
                propname = k
        cls_ns[propname] = property(self.fget, fset,
                                    self.fdel, self.__doc__)
        return cls_ns[propname]

The __init__() wrapper is needed to get __doc__ set properly (this surprised me). The whole cls_ns stuff and the loop are required since the properties are defined in C and their fset attribute is read-only. Which is why the entire property needs to be replaced. The implementation of deleter() can now be regarded as an exercise to the reader...

(O)OXML and ISO voting processes

Friday, April 11, 2008

Many have recently complained about ISO's voting processes, mainly how they need to be revised as currently it seems it's possible to buy yourself a standard given enough lobbyist and money.

But part of this is ISO's trust in ECMA. ISO allows ECMA to submit standards for the fast-track process because it trusts it to approve good standards. If OXML (as it seems to be called now it's approved, formerly OOXML) was indeed such a bad standard (which I don't doubt personally) then ISO should maybe review it's relationship with ECMA too?

This is only a tiny part of the picture obviously, but one I haven't seem mentioned elsewhere.

Shell history

Thursday, April 10, 2008


flub@signy:~$ history | awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn |head
79 nosetests
74 ls
46 cd
36 ssh
28 man
22 apt-cache
19 vi
19 ack
17 svn
17 sudo

It must be noted that most of my editing happens in emacs, but that only gets started a few times a day and then stays there (oh and emacs is started from a shortcut icon, not the shell).

nose and ipython

Sunday, April 06, 2008

I wish there was a simple way to integrate nose and ipython. Something like a command line option not to catch exceptions would be sufficient I think, so that you could do:

In [1]: %run tests/ --do-not-catch-exceptions

Obviously you'd want a shorter option switch...

Seems like in is where this happens, but I tried changing that with no success.

And to be really useful I'd want the default behaviour of test selection in a module where if __name__ == '__main__': nose.main() is used to be just that module. But maybe that's already supported and I'm just not finding it.

Last random thought: Maybe if nose's -d option did expand dotted names I wouldn't be wishing for any of this. Who knows.

One s3fs to rule them all?

Thursday, April 03, 2008

Sometimes I wish we could fast forward 6 or 12 months or so. Hopefully by then there will be just one s3fs that is mature and maintained. Right now it's hard to tell which one will turn out the best.

Interstingly, of the ones that look like they could have potential (trying to word it carefully here) two are written in Python (all are using fuse).

GPL and python modules

Friday, March 28, 2008

I should probably google this, but can't think of good keywords. So instead this is a "dear lazy web" post.

Is importing a python module considered linking for the GPL? I.e. are you allowed to import a python module into an script or module that has a non-GPL-comptible license?


Cheating the KAME turtule

Thursday, March 27, 2008

This is shocking. Even though I have not yet used IPv6 I can see the the KAME turtle dance! I won't give you a direct link but SixXS privde an IPv6->IPv4 service as well as an IPv4->IPv6 service. Feels bad having used the 4->6 bit...

OmniORB 4.1.1 in lenny!

Sunday, March 23, 2008

Something that happened about a week ago I think. But omniORB 4.1.1 finally make it into Debian testing aka lenny! This means that python-omniorb 3.1, which has been waiting for a while now, also make it into lenny!

It has been stalled for ages since we had problems with the ARM port not managing to compile omniORB 4.1.X. Thanks to Thomas Girard for his work on this.

I've already committed the update to 4.1.2 to the svn repo, omniORBpy 3.2 will follow soon...

Compiling libtool examples

Sunday, March 23, 2008

This is more a reminder to myself, but here's how to compile the libtool examples:

$ cd demo
$ aclocal
$ libtoolize --automake
$ automake --add-missing
$ autoconf
$ ./configure
$ make

It's really how to bootstrap any GNU autotools using application I suppose.

Another shell trick

Friday, February 22, 2008

Let's start with the solution instead of the problem:

myvar=$(myfunc) || exit $?

Assuming myfunc is a function defined elsewhere in your shell script this will execute that function and assign it's output to myvar. However when myfunc fails, the entire assignment statement fails with the exit status of myfunc, so you exit with the same exit status by using || exit $?.

It is really important to have this exit statment in there, even if myfunc already has it. You could look at the $(...) construct as a subshell, so the exit of myfunc only exits that subshell. This took me quite a while to figure out!

On a related note, you may know you can execute things in a subshell by using (list). If you do that any variables etc won't affect your current shell. However variables and hence also functions from the parent shell are usable in the subshell. Handy (and maybe obvious).

Creating a Debian ARM build host

Thursday, February 21, 2008

If you want/need to compile a program on ARM but are not lucky enough to have the hardware available QEMU can still help you. Aurélien Jarno has an excellent description of how to do this. It is still valid today, even if you want to install sid instead of etch, first install etch then upgrade.

The only thing he omits is the need to start a getty on /dev/ttyAMA0 when you use -nographics with a console on ttyAMA0. Other then that everything goes terribly smooth.

Apparently emulating ARM on a recent machine is supposed to be even faster then a real ARM. I have no idea if this is true, but I still find it slow... Makes me think twice about considering a Thecus N2100 or so as my home server.

Docstring folding

Saturday, February 16, 2008

In my last post I had a little rant about how python docstrings visually separate the function/method/class definition from the body. Mikko Ohtamaa raised a very good point though: the editor should fold it away.

Since my editor of choice is Emacs, said to be more an operating system then an editor, this must be possible. But surely someone more intelligent then me must have implemented this already (for I still don't know lisp, shame on me)! So I read the python-mode description which mentioned something about support for outline-minor-mode. This lead me to play around a little with outline-mode. I'ts rather nice and seems to do useful folding, once I get used to it.

But I'm not sure I can figure out how to fold away just the docstring in outline-mode! There seems to be a "Hide Entry" function that most of the time does just hide the docstring when the cursor is located over it. But sometimes it will hide a few more lines too... Surely I must be missing something, anyone some hints?

Documenting functions/methods/classes

Tuesday, February 12, 2008

There are several simple guidelines for writing functions or methods (guidelines that I like that is).

  • Keep them clean, don't intersperse them with comments. If they require more then one or two inline comments it's too complicated and you should restructure.
  • Keep them short, it should fit on one screen. I'm flexible, a screen is an emacs buffer, not a vi in a terminal as Linus requires.
  • Write a clear description at the top, explaining parameters, return values, exceptions etc whenever these things might not be blatingly obvious in a month's time. As soon as you explain one parameter you you should describe the whole lot though.

In C this is quite nice, it looks like this:

/** Print an integer number
 * The number is printed to stdout.  Really that's
 * all there is to it.
 * param - The integer to print.
void my_func(int param)
    printf("This is an integer: %d", param);

While in Python this would look like this:

def my_func(param)
    """Print a number
    The number is printed to stdout.  Really that's
    all there is to it.

    param - The number to print.
    print "This is a number: %d" % param

While python code is generally very lovely readable this is something that annoys me about it. Suddenly my function definition is separated from the code by the docstring, often causing the definition and the code not to fit on one screen anymore (it is foolish IMHO to require the documentation and code to fit on one screen, given my documentation conventions).

While I understand every reason for the way things are and acknowledge that it probably won't ever change it still annoys me from time to time so I felt like moaning about it.


Thursday, January 31, 2008

Creating a Facebook group to complain about the UK ID cards because:

[...] An ID card would hold ALL personal information about you including biometrics, hospital records etc etc., a perfect target for ID thieves. [...]

And you're doing this on Facebook?

Mild improvement for movie industry

Monday, January 21, 2008

Until now every time I bought a DVD I found a few scary and angry leaflets in the box telling me how insanely bad I might be for buying pirated DVDs. They usually convince you that you are the worst person in the world for no particular reason. So just like I feel like I'm treated as a criminal when having to see trailers of the same genre at the cinema or have to take of my shoes at the airport, I hate having to open a DVD case and see those silly slogans.

Today however I opened a DVD case and instead got a nice and friendly looking leaflet (almost as relaxing as "Don't Panic" compared to the normal leaflets) congratulating me for buying a genuine DVD. I have to say I find it still disappointing that they feel like having to put in such a leaflet, but it's definitely more customer friendly this way: I don't get to feel bad.

If only now they would make them region free and not content scramble the disk (as I'm still about to break the law by watching my genuine DVD in a few minutes) I would actually start to believe they started listening to their customers.

Rock climbing in Costa Daurada

Sunday, January 06, 2008

Just spent a long week sport climbing in Costa Daurada. We stayed in a house (with about 32 students and ex-students) near the village of Cornudella de Montsant which was an excellent location really close to Siurana and Arboli. Absolutely brilliant climbing and although my brain wanted to stay a lot longer my body seems quite glad to have some rest.

If you like sport climbing it is definitely a recommended place. Great times.

Updated omniORB in Debian

Sunday, January 06, 2008

A while ago Thomas Girard started the Debian CORBA Team with the aim of maintaining omniORB and related packages in Debian. After a lot of work (and time!) we managed to get the packages of up to date versions in a good state fixing almost all bugs in Debian's BTS. And thanks to Thomas' hard work he managed to upload working packages of omniORB and omniORBpy (omniorb4 and python-omniorb respectively) just before Christmas (I'm only blogging this now since I was on holiday since then).

The only issue is that they're only available in unstable and not yet in testing. There seems to be an issue with omniidl segfaulting on ARM. Unfortunately I haven't got an ARM platform available to debug on, Thomas however had already a quick look and it thinks it looks like a stack frame corruption. Hopefully we'll find this problem sooner rather then later, just frustrating that I can't really do much about it.

Lastly this means that if you where using the packages that I put up on a (long) while ago you shouldn't use them anymore, instead you should fetch the packages in Debian's sid.

Subscribe to: Posts (Atom)