devork

E pur si muove

Decorators specific to a class

Wednesday, December 30, 2009

My gut tells me it's horribly wrong but I am failing to formulate a decent argument, let me show an example what I mean (somewhat contrived):

def deco(f):
    def wrapper(self, *args, **kw):
        with self.lock:
            f(self, *args, **kw)

class Foo:
    def __init__(self):
        self.lock = threading.Lock()

    @deco
    def method(self):
        pass

Here the decorator knows something about the arguments of the function it will end up calling, the first argument is "self" and it is an object with a "lock" attribute which is a context manager. Somehow I feel like that's more knowledge about the wrapped object then a decorator should have. It just is an indirection of logic my bain doesn't cope with.

There are obviously places where I could construct something like this. But I do never naturally think of doing it that way, I always end up with some other way which I find more elegant and I think the resulting logic is easier to follow. It's just that whenever I encounter code like this my brain starts hurting and I'm not sure I have a decent argument to tell people writing code like this off (you can hardly regard "it's a level of logic indirection that makes my brain hurt" as an argument).

Calling COM methods in C

Saturday, December 26, 2009

So Windows has this strange thing called COM. It allows you to share objects between unrelated processes and languages, a bit like CORBA in that sense, only very different. Anyway, if you'd like to get information out of this other Windows thing called WMI (which provides a lot of information you can't get hold of using native APIs, at least not if you don't digg into unpublished internals) then you've got to use COM.

But Microsoft seems to have decided some long time ago that C++ is such an amazing language (because if it's got ++ in the name it must be better then say just C, you know) that it solves all of the world's problems. So obviously that is the language you favour when designing APIs, meaning you end up with very convoluted APIs when using good old plain C. And obviously there is no need to document the way you do things in C because no one would use it. Anyway, my conclusion is that COM is crazy and I can only assume that .NET must somehow make this a bit easier (just like using WMI from Python is easy with Mark Hammond's pythoncom and Tim Golden's wmi module) which is probably the reason that C# is popular among developers who choose Windows as their platform.

Back to the point however: how do you call a method on a COM object using C? Microsoft will always show instance->Method() in their examples, but in C it's not that much different:

instance->lpVtbl->Method(instance)

There is also supposed to be a way to use some macro's using the class name of the object joined up with the method name. For this you need to define the COBJMACROS before your #includes and then you can call:

ClassName_Method(instance)

Only I can't get that way to work. No clue why.

For completeness here is the full example of the WMI local computer example in the MSDN library, ported to C. For extra fun the error handling is done by setting Python exceptions, but that shouldn't confuse things. It also prints the result string in both unicode and ascii, from the unicode string you could have easily made a PyUnicodeObject. There's probably many ugly things in there as I'm really not a win32 developer. It's also example code and not what I'd take into production (e.g. I think the looping over the results is broken but I'm sticking close the the MSDN example).

static int
example(void)
{
    HRESULT hr;
    IWbemLocator *pLoc;
    IWbemServices *pSvc;
    IEnumWbemClassObject *pEnumerator;
    BSTR bstr, bstr2;

    hr = CoInitializeEx(0, COINIT_APARTMENTTHREADED);
    if (hr == RPC_E_CHANGED_MODE)
        hr = CoInitializeEx(0, COINIT_MULTITHREADED);
    if (hr != S_OK && hr != S_FALSE) {
        PyErr_Format(PyExc_WindowsError,
                     "Failed to initialise COM (HRESULT=0x%x)", hr);
        return -1;
    }
    hr = CoInitializeSecurity(NULL, -1, NULL, NULL,
                              RPC_C_AUTHN_LEVEL_DEFAULT,
                              RPC_C_IMP_LEVEL_IMPERSONATE,
                              NULL, EOAC_NONE, NULL);
    if (FAILED(hr)) {
        PyErr_Format(PyExc_WindowsError,
                     "Failed to initialise COM security (HRESULT=0x%x)", hr);
        CoUninitialize();
        return -1;
    }
    hr = CoCreateInstance(&CLSID_WbemLocator, 0, CLSCTX_INPROC_SERVER,
                          &IID_IWbemLocator, (LPVOID *) &pLoc);
 
    if (FAILED(hr)) {
        PyErr_Format(PyExc_WindowsError,
                     "Failed to get IWBemLocator (HRESULT=0x%x)", hr);
        CoUninitialize();
        return -1;
    }
    bstr = SysAllocString(L"ROOT\\CIMV2");
    if (bstr == NULL) {
        PyErr_SetString(PyExc_WindowsError, "BSTR allocation failed");
        pLoc->lpVtbl->Release(pLoc);
        CoUninitialize();
    }
    hr = pLoc->lpVtbl->ConnectServer(pLoc, bstr, NULL, NULL,
                                     NULL, 0, NULL, NULL, &pSvc);
    SysFreeString(bstr);
    if (FAILED(hr)) {
        PyErr_Format(PyExc_WindowsError,
                     "Localhost connection for WMI failed (HRESULT=0x%x)", hr);
        pLoc->lpVtbl->Release(pLoc);
        CoUninitialize();
        return -1;
    }
    hr = CoSetProxyBlanket((IUnknown*)pSvc, RPC_C_AUTHN_WINNT, RPC_C_AUTHZ_NONE,
                           NULL, RPC_C_AUTHN_LEVEL_CALL,
                           RPC_C_IMP_LEVEL_IMPERSONATE, NULL, EOAC_NONE);
    if (FAILED(hr)) {
        PyErr_Format(PyExc_WindowsError,
                     "Failed to set ProxyBlanket (HRESULT=0x%x)", hr);
        pSvc->lpVtbl->Release(pSvc);
        pLoc->lpVtbl->Release(pLoc);
        CoUninitialize();
        return -1;
    }
    bstr = SysAllocString(L"WQL");
    if (bstr == NULL) {
        PyErr_SetString(PyExc_WindowsError, "BSTR allocation failed");
        pSvc->lpVtbl->Release(pSvc);
        pLoc->lpVtbl->Release(pLoc);
        CoUninitialize();
        return -1;
    }
    bstr2 = SysAllocString(L"SELECT * FROM Win32_OperatingSystem");
    if (bstr2 == NULL) {
        PyErr_SetString(PyExc_WindowsError, "BSTR allocation failed");
        SysFreeString(bstr);
        pSvc->lpVtbl->Release(pSvc);
        pLoc->lpVtbl->Release(pLoc);
        CoUninitialize();
        return -1;
    }
    hr = pSvc->lpVtbl->ExecQuery(pSvc, bstr, bstr2,
                                 WBEM_FLAG_FORWARD_ONLY |
                                 WBEM_FLAG_RETURN_IMMEDIATELY, 
                                 NULL, &pEnumerator);
    SysFreeString(bstr);
    SysFreeString(bstr2);
    if (FAILED(hr)) {
        PyErr_Format(PyExc_WindowsError, "WMI query failed (HRESULT=0x%x)", hr);
        pSvc->lpVtbl->Release(pSvc);
        pLoc->lpVtbl->Release(pLoc);
        CoUninitialize();
        return -1;
    }
    {
        IWbemClassObject *pclsObj;
        ULONG uReturn;
        VARIANT vtProp;       
            
        while (pEnumerator) {
            hr = pEnumerator->lpVtbl->Next(pEnumerator, WBEM_INFINITE,
                                           1, &pclsObj, &uReturn);
            if(uReturn == 0)
                break;
            hr = pclsObj->lpVtbl->Get(pclsObj, L"Name", 0, &vtProp, 0, 0);
            wprintf(L"XXX OS Name: %s\n", vtProp.bstrVal);
            {
                /* XXX Need error checking in here */
                /* Allocating the UTF-16 string size since that will be at
                 * least double the ASCII size, which is fine. */
                char *prop;
                int r;

                prop = psi_malloc(SysStringByteLen(vtProp.bstrVal));
                /* if (prop == NULL) */
                r = WideCharToMultiByte(20127, 0, vtProp.bstrVal, -1, prop,
                                        SysStringByteLen(vtProp.bstrVal),
                                        NULL, NULL);
                /* if (!r) */
                printf("XXX OS Name: %s\n", prop);
            }
            VariantClear(&vtProp);
            pclsObj->lpVtbl->Release(pclsObj);
        }
    }
    pEnumerator->lpVtbl->Release(pEnumerator);
    pSvc->lpVtbl->Release(pSvc);
    pLoc->lpVtbl->Release(pLoc);
    CoUninitialize();
    return 0;
}

Setting descriptors on modules

Wednesday, December 23, 2009

This counts for one of the more crazy things I'd like to do with Python: insert an instance of a descriptor object into the class dict of a module.

While you can get hold of the module type class by using the __class__ attribute of any module or use types.ModuleType you obviously can't do this since the __dict__ of the module class is actually a DictProxy and hence immutable. Which I think is rather sad this time round.

My use case is to be able to set an attribute on a module that would lazily evaluate a sort of semi-singleton. Suppose you have the instance of your application and to make it available to other modules you place the instance in a module attribute. You want it to be available since it has references to useful global things like the configuration instance in use etc (and you hate having singletons for all these useful things). To now get this application instance from another module, without getting it passed in with some sort of dependency-injection (which can result in intangible spaghetti all too easily), you can now get hold of it like this:

import app_package.app_module
app_package.app_module.app_instance # the instance

But what you very likely can't do is this:

from app_package.app_module import app_instance

The reason is that import statements are usually at the top of modules and thus you will very likely execute this import statement before the app_instance existed and you will get None (I will anyway, since I initialised that attribute with None before the app gets instanced). And obviously even once the application is instanced and the attribute gets set, I'm still stuck with the reference to None instead of the application instance.

So my reason to want to set a descriptor instance in the module class is so that the descriptor could lazily evaluate the application instance when accessing it. Getting and using the instance would reduce to something like:

from app_package.app_module import app_instance
app_instance # the instance!

Which I think is much cleaner. But for now I'll have to stick with:

from app_package.app_module import get_instance
get_instance() # the instance

Or perhaps Python has another trick up it's sleeve I haven't found yet?

Skipping slow test by default in py.test

Saturday, December 19, 2009

If you got a test suite that runs some very slow tests it might be troublesome to run those all the time. There is of course the "-k" option to py.test which allows you to selectively enable only a few tests. But what I really wanted was a way to have it skip slow tests by default but still allow me to enable slow tests with a command line option.

But this is not impossible, py.test provides a couple of things that make it possible to do this surprisingly easy:

So how do you pull this together? I decided that i want to mark slow tests using the "slow" keyword (@py.test.mark.slow) and skip those tests by default. And the option to enable those tests would be called "--slow". The following conftest.py is staggeringly simple:

import py.test

def pytest_addoption(parser):
    parser.addoption('--slow', action='store_true', default=False,
                      help='Also run slow tests')

def pytest_runtest_setup(item):
    """Skip tests if they are marked as slow and --slow is not given"""
    if getattr(item.obj, 'slow', None) and not item.config.getvalue('slow'):
        py.test.skip('slow tests not requested')

Finding this solution was a little harder however, but essentially it involved nothing more then looking at the pytest_skipping and pytest_mark plugins to figure out what APIs the various objects provide. But it must be said that extending py.test is staggeringly simple, I tend to expect a much higher learning curve when I decide that I want to write a plugin for some tool I use.

setuptools vs distribute

Friday, December 18, 2009

Reinout van Rees:

In case you heard of both setuptools and distribute: distribute fully replaces setuptools. Just use distribute. Setuptools is “maintained” (for various historically dubious values of “maintain”) by one person (whom all should applaud for creating the darn thing in the first place, btw!). Distribute is maintained by a lot of people, so bugs actually get fixed. And “bugs” meaning “it doesn’t break with subversion 1.6 because patches that fix it don’t get applied for half a year”.

Pretty much one of the best descriptions of the differences between setuptools and distribute that I've seen. Not that I use either, but whatever.

update: Obviously this was written many moons ago and a lot has changed in packaging since. And currently (Nov 2013) setuptools is the way to go if you want/need it.

Dictionary entry

Monday, December 07, 2009

innovative company noun 1 a company which can't afford patent lawyers 2 one of the lucky handful companies who are innovative and have money for patent lawyers HELP Not to be confused with most companies lobying in patent litigation.

PS: It's also possible to have companies that can't afford patent lawyers but still don't innovate.

How to drive on a motorway

Sunday, December 06, 2009

If you are driving on a motorway where I am driving too, here are some rules you should follow. I think you should even follow them when I'm not around.

  1. Be sensible, this one overrules all the others
  2. Keep left unless overtaking
  3. KEEP LEFT UNLESS OVERTAKING
  4. KEEP LEFT UNLESS OVERTAKING
  5. Always indicate when changing lanes
  6. Pick a speed and stick to it, avoid speeding up or slowing down whenever possible
  7. Be courteous, even when people make mistakes

It's not that hard, try it and you'll realise traffic would be a lot more fluent if everyone stuck to these rules.

PS: Don't forget to substitute left with right depending on your country.

How to make a crosslinked RS-232 cable

Thursday, December 03, 2009

Because it's easier then buying one

When connecting two computes together (e.g. your laptop to a server or router) to get access to a console you need to use a crosslinked RS-232 cable, usually with two female DE-9 connectors these days. This cable is more commonly known as a null-modem cable for historical reasons.

Of course you can easily buy a cable under the name of "null modem cable", but you're almost guaranteed to get a sloppy made one. In fact I haven't been able to find a decently made one. The problem is that according to the standard the DTR is supposed to be connected to the DSR and CD, but most cables as sold won't connect the carrier detect line. No idea why, perhaps making the bridge is something other connectors don't need and therefore expensive to automate for production?

So to make one yourself you can cut up an existing cable and solder new DE-9 connectors onto it yourself, keeping to the proper schema. It's really not that hard, all you need is a soldering iron and two D-sub 9 pin connectors and corresponding hoods. I'm guessing you could even use cheap CAT-5e networking cable instead of cutting up an expensive pre-made null-modem cable since Sun uses network cables for their serial cables (whith an adaptor).

Now most devices actually seem to ignore the carrier detect line anyway. But in case you are lucky enough to be using IBM's AIX or IVM on one of their System p5 servers you will discover that you are able to log into the service processor with a "normal" (i.e. non-standard) serial cable just fine. You will also be able to perform an installation from that searial cable just fine. But when you then need to access the console of that just installed system you're stuck and just see nothing. Until you use a proper cable that is. (Just for the record, you can most likely telnet or ssh into the system by now, just try it).

And if my love for AIX is not yet clear from the above paragraph let me be a bit more explicit: if you are in a position to choose get something else then AIX. Sun's hardware as well as Solaris are lovely (they ship with a serial cable for instance - as does cisco). Obviously GNU/Linux is a great choice for an OS too. This doesn't mean that I approve of cable vendors not making RS-232 crosslink cables propperly of course.

Finding memory leaks in extension modules

Wednesday, November 25, 2009

After reading Ned Batchelder's post on his experience hunting a memory leak (which turned out to be a reference counting error) it occurred to me that even tough I have a script to check memory usage I should also really be checking reference counts with sys.gettotalrefcount(). And indeed, after adding this to my script I found one reference count leak. I still have faith in my script as it was before really since the reference leak in question was not making me loose memory - subtle bugs eh?

But how do you check an extension module for memory leaks? This seems pretty undocumented so here my approach:

  • First you really need a debug build of python, this helps a lot since you get to use sys.gettotalrefcount() and get more predictable memory behaviour. The most complete way to build this is something like this (the MAXFREELIST stuff adapted from this):

    s="_MAXFREELIST=0"
    ./configure --with-pydebug --without-pymalloc --prefix=/opt/pydebug \
    CPPFLAGS="-DPyDict$s -DPyTuple$s -DPyUnicode$s -DPySet$s -DPyCFunction$s -DPyList$s -DPyFrame$s -DPyMethod$s"
    make
    make install
    
  • Now run the test suite using valgrind, this is troublesome but a very useful thing to do. The valgrind memory checker will help you identify problems pretty quickly. It can be confused about Python however, but you only care about your extension module so you need to filter most of this. Luckily the python distribution ships with a valgrind suppression file in Misc/valgrind-python.supp that you can use, it's not perfect but helps. This is how I invoke valgrind:

    $ /opt/pydebug/bin/python setup.py build
    $ valgrind --tool=memcheck \
        --suppression=~/python-trunk/Misc/valgrind-python.supp \
        --leak-check=full /opt/pydebug/bin/python -E -tt setup.py test
    ==8599== Memcheck, a memory error detector
    ==8599== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
    ==8599== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
    ==8599== Command: /opt/pydebug/bin/python -E -tt setup.py test
    ==8599== 
    ==8599== Conditional jump or move depends on uninitialised value(s)
    ==8599==    at 0x400A66E: _dl_relocate_object (do-rel.h:65)
    ==8599==    by 0x4012492: dl_open_worker (dl-open.c:402)
    ==8599==    by 0x400E155: _dl_catch_error (dl-error.c:178)
    ==8599==    by 0x4011D0D: _dl_open (dl-open.c:616)
    ==8599==    by 0x405AC0E: dlopen_doit (dlopen.c:67)
    ==8599==    by 0x400E155: _dl_catch_error (dl-error.c:178)
    ==8599==    by 0x405B0DB: _dlerror_run (dlerror.c:164)
    ==8599==    by 0x405AB40: dlopen@@GLIBC_2.1 (dlopen.c:88)
    ==8599==    by 0x8132727: _PyImport_GetDynLoadFunc (dynload_shlib.c:130)
    ==8599==    by 0x81199D9: _PyImport_LoadDynamicModule (importdl.c:42)
    ==8599==    by 0x81161FE: load_module (import.c:1828)
    ==8599==    by 0x8117FAF: import_submodule (import.c:2589)
    ...
    running test
    ...
    FAILED (failures=4, errors=2)
    ==8599== 
    ==8599== HEAP SUMMARY:
    ==8599==     in use at exit: 1,228,588 bytes in 13,293 blocks
    ==8599==   total heap usage: 280,726 allocs, 267,433 frees, 70,473,201 bytes allocated
    ==8599== 
    ==8599== LEAK SUMMARY:
    ==8599==    definitely lost: 0 bytes in 0 blocks
    ==8599==    indirectly lost: 0 bytes in 0 blocks
    ==8599==      possibly lost: 1,201,420 bytes in 13,014 blocks
    ==8599==    still reachable: 27,168 bytes in 279 blocks
    ==8599==         suppressed: 0 bytes in 0 blocks
    ==8599== Rerun with --leak-check=full to see details of leaked memory
    ==8599== 
    ==8599== For counts of detected and suppressed errors, rerun with: -v
    ==8599== Use --track-origins=yes to see where uninitialised values come from
    ==8599== ERROR SUMMARY: 75 errors from 5 contexts (suppressed: 19 from 6)
    

    Note that the output is very verbose, usually I actually start with --leak-check=summary. Firstly notice that valgrind gives a lot of warnings already before your extension module gets loaded, that's python's problems and not yours so skip over that. The stuff output after (and during) the output of the test suite is what interests you. Most importantly look at the definitely lost line if that's not zero the you have a leak. The possibly lost is just python's problem (which sadly might hide problems you created too). When you do have lost blocks valgrind will give you a stack trace to pinpoint it, but you'll have to swim trough lots of "possibly lost" stack traces of python to find it. Best is probably to grep for your source files in the output.

  • Next you should create function you want to execute in a loop, this should be exercising the code you want to tests for leaks. If you're really thourough possibly the entire test suite wrapped up in a function call would be good.

    Wrap it all up in a script that checks the memory usage and reference counts on each loop and compares the start and end values. Getting memory usage might be tricky from python (or you can use PSI of course) so depending on your situation you might prefer to do this with an script from your operating system.

    For PSI this is the script I currently use. I clearly have it easy since I can be sure PSI will be available :-). The reason I don't automate this script further (you could turn it into a unittest) is that I prefer to manually look at the output. Both memory and reference counting are funny and will most likely grow a little bit anyway. By looking at the output I can easily spot if it keeps growing or stabilises, there is only a problem if it keeps growing with every iteration (don't be afraid to run with many many iterations from time to time). When automating this you probably end up allowing some margin and might miss small leaks.

Hopefully some of this was useful for someone.

New Python System Information release!

Saturday, November 21, 2009

I've just released a new version of PSI! PSI is a cross-platform Python package providing real-time access to processes and other miscellaneous system information such as architecture, boottime and filesystems. Among the highlights of this release are:

  • Improved handling of time: We now have our own object to represent time. This may seem silly at first but it actually makes it easier to use all the normal ways of representing time easily as well as provide the highest possible accuracy.
  • Improved handling of process names and arguments: There is an entire wiki page dedicated to this, but basically this simplifies presenting sensible process names to users massively since some attributes will always have meaningful values.
  • Restructured exceptions: Whenever accessing attributes you will get a subclass of AttributeError like it should be so now you can happily use getattr().
  • New experimental methods on Process objects: You can now send signals to processes using the .kill() method and find their children by using the .children() method.
  • New experimental mount module: You can get detailed information about all mounted filesystems using this new module. It provides mount information as well as usage.

Another notable improvement is the ability to read the arguments of a 64-bit process while running inside a 32-bit python process on Solaris. It's small and almost no-one will notice it but make it so much more consistent!

Release early, release often: fail

Now for the bad news: all this means the API has changed in a backwards incompatible way.

It was already pretty obvious shortly after the last release that this would happen and was the reason I was hoping to release a new version soon. But that didn't happen. Although the last version had a "beta" version number on it it's trove classifier still claimed to be "alpha" and in end we don't promise API stability till we hit 1.0. But it's still not nice. Once we hit 0.3 we will actually try not to introduce changes to the API if possible. We intend to help this by using FutureWarning for APIs we're not sure about yet. In the mean time let's see how the Process API holds out during this release, hopefully it will prove to be good and require no more changes.

Credits

As before, thanks to Chris Miles and Erick Tryzelaar for helping out.

Synchronous classes in Python

Monday, November 16, 2009

What I'd like to build is an object that when doing anything with it would first acquire a lock and release it when finished. It's a pattern I use fairly-regularly and I am getting bored of always manually defining a lock next to the other object and manually acquiring and releasing it. It's also error prone.

Problem is, I can't find how to do it! The __getattribute__ method is bypassed by implicit special methods (like len() invoking .__len__()). That sucks. And from the description of this by-passing there seems to be no way to get round it. For this one time where I thought I found a use for meta-classes they still don't do the trick...

Python modules and the GPL: I still don't get it

Tuesday, November 03, 2009

I've never understood if you can use GPL python modules (or packages) in GPL-incompatibly licensed code. Today I re-read the GPL looking for this and am tempted to think yes, you can. The GPL says:

1. You may copy and distribute verbatim copies of the Program's source code as you receive it [...]

This is simple enough, what if you need to change it? This gets more interesting:

2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, [...] provided that you also meet all of these conditions:

[...]

b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.

[...]

These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.

As I uderstand this it means you can happily ship a GPL python module next to a GPL-incompatible module, even when the second uses API calls from the first. As long you are always offering to give the source of the GPL module, including modifications if you made any, you are fine. The two modules are individually identifiable sections and can be reasonably considered independent and separate works, or so I reckon at least.

But wait, the non-GPL module uses the API of the GPL module, is it still independent and separable then? In my humble opinion the GPL says it must be reasonably considered independtent and separate, so yes. It's feasable to take away the GPL module and replace it by another one that offers the same API, hence I would argue they are separable (feasable and easy are not synonyms!).

According to my reasoning it should also be legal to use GPL C libraries in non-GPL programs as long as you use them as shared libraries and not as a static library. But why does the Lesser General Public License (LGPL) exist then? Qoting the the LGPL preamble (emphasis mine):

When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library.

Now a shared C library is linked by the dynamic linker before the application gets started. If the library is not present the application will fail horribly, I guess this is why they consider it legally speaking a combined work. When using a python module the python interpreter will happily start and depending on how the application is written can fail gracefully or even provide partial functionality.

The GPL FAQ argues that it depends on the technical means used to run the final application whether the modules should both be GPL or not. The arguments seem to suggest that if code runs in shared address space as GPL-licensed code, then the whole must comply to the GPL. Given that translation of a work is considered a derrivative we can assume python translates the python code (interprets it) and then executes the results in the same address space, therefore all must comply to the GPL (including python itself).

I'm still hard pushed to call the non-GPL module a derrivative work of the GPL module however. And in a lenghty article (apparenlty written by real lawyers, definitely worth a read!) the authors argue that legally it depends on a lot more then the technicality to determine if a work becomes a derrivative work or not (and that's where it all revolves around: if its a derrivative and you distribute it the GPL applies). This does indeed seem a lot more logical: imagine you accept the interpretation in the GPL FAQ, all you need to do to distribute the two modules is use pyro or some other form of inter-process communication (excluding shared memory according to the GPL FAQ, this again I find hard to accept) and use the APIs of the modules over this IPC layer.

The above article describes how some courts have judged this derrivation question and gives a couple of pointers itself. Essentially it common sense, but that's hard to qantify as even the GPL FAQ fails. I should really summarise the factors in a paragraph here, but I was formally trained to be an engineer and not a lawyer so am not used to summarising lawyer articles (and frankly I'm too lazy to perform the exercise right now). Beside I'd probably skip a lot of subtle points so you're best off reading the article yourself.

I do have a conclusion however: if you have a python module which is not GPL-compatible but uses the API of another python module covered by the GPL chances are you are fine if: (i) you are not taking away market share of the GPL module. And (ii) you are not derriving or extending the creative work or copyrightable content of the module. But there's no distinct line, common sense is your friend (and enemy).

Delny 0.4.1

Saturday, October 24, 2009
A little while ago I released a new version of Delny (my python wrapper around Qhull for Delaunay triangulations), the main purpose to use numpy instead of numeric.  Impressively enough people actually seemed to care and I got a few bug reports and hints for improvements.

So I just released 0.4.1 with some of these updated:
  • I forgot to change some python code to numpy, so was still importing numeric
  • Use numpy.get_include() to find the numpy header files
At the same time I fixed the 2D square issue by using the Qz option and removing the extra point from all output.  Finally fixing a unittest that has been failing for ages.

To compensate however I added a new test that fails to triangulate a 3x3 2D square (well, it works but doesn't create simplical facets), thanks to Stephen McQuay's bug report.  AFAIK there's not bug for this in Delny's code, Qhull just seems to return the wrong result (but it does work on the command line).

Now if someone would donate me windows binaries that would save me answering those questions too...

No permission to see Ubuntu bugs?

Thursday, October 22, 2009

So I'm looking at the release notes for Ubuntu 9.10 and am interested in the hibernation issue. Naturally I follow the link to the bug report.

First I need to log in to launchpad. That's weird, since when do I need to log in simply to view bugs? But after logging in I see this:

Not allowed here
Sorry, you don't have permission to access this page.
You are logged in as Floris Bruynooghe.

Since when is Ubuntu hiding bugs? Sure security bugs might need to be hidden for a while but you could be more descriptive about that. And surely you shouldn't be linking to such bugs?

Just weird.

Cross platform shell scripts

Thursday, September 24, 2009

At work we have an rather elaborate collection of shell scripts that builds our softweare and all it's dependencies on about 7 UNIX variants as well as on Windows. Shell seemed like a good choice when we started to write those scripts: it is available on all hosts (using Cygwin on Windows) and with some care you can do most task in a portable way, ssh and scp are at your fingertips to do things on remote hosts without any extra hassle etc.

The thing I didn't realise is just how extrordinary expensive forking is on Windows. And when having lots of shell that's what you're doing all the time (cut, tr, grep, awk, ...), on UNIX you generally don't even notice it but on Windows it just grinds to a halt.

This basically means I'm slowly moving to replacing the scripts in Python. Each time I need to modify something I consider the cost of re-writing it in Python and of keeping it in shell. Some critical parts are already replaced by Python code and the speedup is impressive.

So next time you need cross-platform scripts, think about how much work it will need to do on Windows. Forking is expensive. Sadly.

Battery life

Wednesday, September 23, 2009

Normally I'm quite happy with the 3h of battery life of my laptop, it covers all my disconnected time on trains etc. But it just doesn't cut it on a 10h flight, especially painful if my brain has loads of ideas to try out and things to do.

New Delny release

Sunday, September 20, 2009

A few days ago I got another of those two-a-year inquiries about Delny, the python wrapper around Qhull for Delaunay triangulations. But increadably indirectly it finally got me to do a new release, it's only been a few years since the last one! Nothing has changed really, the only difference it that it finally uses numpy to compile instead of the deprecated numeric

I've also taken the opportunity to move the revision control to mercurial and host it on bitbucket. And uploaded the tarball to PyPI, not sure why I didn't do that before.

Hope this keeps being of use to some people.

Resuming an scp file transfer

Thursday, August 20, 2009

Sometimes you have to transfer data over dodgy links. And when you're transferring a large file this can hurt when it fails. So as explained elsewhere rsync can safe the day:

$ scp host:remote_file local_file
# now interrupt and resume with
$ rsync --partial --progress --rsh=ssh host:remote_file local_file

You need rsync on the remote server for this however. But usually that's not too much of a hurdle.

Should bare except statements be allowed in the Python stdlib?

Thursday, August 13, 2009

Firstly to clarify the terminology, this is bare except statement:

try:
    ...
except:
    ...

And this is a non-bare except statement, but bear in mind the type of the exception that is caught can be anything:

try:
    ...
except Exception:
    ...

The point is that both fragments are a catch-all exception handler, only the second is slightly more better/restrictive since it won't catch a SystemExit exception for example (which you rarely want to catch). This is obviously discussed before and even made it to a (rejected) PEP.

So I'm tempted to say that the stdlib should not use bare except statements. If you need to catch more then Exception they can always catch BaseException. However grepping the stdlib reveals 384 cases of bare except statements, to be fair many of these are in test cases but still.

The one that hurt me today was in socketserver.BaseServer.handle_request, now I have to re-write the .handle_error() function to call sys.exc_info() and check that it's a subclass of Exception before handling the error normally. That's not nice.

Pain

Tuesday, August 04, 2009

I've spent the last 2 days trying to get a stack trace from a crashing python extension module in windows. And I still haven't figured it out. That's sooo very motivating.

Give me GNU/Linux any day.

cifs why don't you follow unix conventions?

Monday, August 03, 2009

Often it's nice to just have conventions and gentleman's agreements. But then sometimes someone doesn't and the harm is done, it's too hard to reverse.

NFS has introduced the de-facto standard of using "<host>:<path>" for the defice of a remote mount. Yes this break because you can have ":" in a filename, but in practice it works.

<rant>

But somehow cifs (formerly smbfs), decided that adhering to the windows syntax was more important then adhering to the unix standard, never mind that it's a filesystem implementation for unix-like systems. So the remote mountpoint for a cifs filesystem is "//<host>/<path>". Yes, this breaks too, you could start any path with "//". But it's just plain annoying when trying to figure out if a filesystem is remote or not.

</rant>

Letters on keyboard stop working

Thursday, July 30, 2009

Dear Lazy Web

I have some Very weird behaviour where the lowercase letters "C" and "V" stop generating the Correct key press event after a short while (which is why I am typing them as upper Case here...). But only when using Compiz as window manager, when switching away from X to the Console they work again but in X they don't. When looking at the keypress in xeV it shows up like this:

FocusOut event, serial 34, synthetic NO, window 0x5000001,
    mode NotifyGrab, detail NotifyAncestor

FocusIn event, serial 34, synthetic NO, window 0x5000001,
    mode NotifyUngrab, detail NotifyAncestor

KeymapNotify event, serial 31, synthetic NO, window 0x0,
    keys:  82  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
           0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

For Comparison with a normal keypress:

KeyPress event, serial 31, synthetic NO, window 0x5000001,
    root 0xa9, subw 0x0, time 18685100, (888,414), root:(898,513),
    state 0x10, keycode 53 (keysym 0x78, x), same_screen YES,
    XLookupString gives 1 bytes: (78) "x"
    XmbLookupString gives 1 bytes: (78) "x"
    XFilterEvent returns: False

KeyRelease event, serial 34, synthetic NO, window 0x5000001,
    root 0xa9, subw 0x0, time 18685275, (888,414), root:(898,513),
    state 0x10, keycode 53 (keysym 0x78, x), same_screen YES,
    XLookupString gives 1 bytes: (78) "x"
    XFilterEvent returns: False

I am totally at a loss what the problem Could be. And searching for this is really hard and so far entire unsuccessful. The most suspicious thing I Can find in Xorg.log is:

(WW) Logitech USB Multimedia Keyboard: unable to handle keycode 267

But I have no idea if this is related.

Any hints?

How to bring a running python program under debugger control

Saturday, July 25, 2009

Of course pdb has already got functions to start a debugger in the middle of your program, most notably pdb.set_trace(). This however requires you to know where you want to start debugging, it also means you can't leave it in for production code.

But I've always been envious of what I can do with GDB: just interrupt a running program and start to poke around with a debugger. This can be handy in some situations, e.g. you're stuck in a loop and want to investigate. And today it suddenly occurred to me: just register a signal handler that sets the trace function! Here the proof of concept code:

import os
import signal
import sys
import time


def handle_pdb(sig, frame):
    import pdb
    pdb.Pdb().set_trace(frame)


def loop():
    while True:
        x = 'foo'
        time.sleep(0.2)


if __name__ == '__main__':
    signal.signal(signal.SIGUSR1, handle_pdb)
    print(os.getpid())
    loop()

Now I can send SIGUSR1 to the running application and get a debugger. Lovely!

I imagine you could spice this up by using Winpdb to allow remote debugging in case your application is no longer attached to a terminal. And the other problem the above code has is that it can't seem to resume the program after pdb got invoked, after exiting pdb you just get a traceback and are done (but since this is only bdb raising the bdb.BdbQuit exception I guess this could be solved in a few ways). The last immediate issue is running this on Windows, I don't know much about Windows but I know they don't have signals so I'm not sure how you could do this there.

devenv.com via cygwin ssh (visual studio 2003)

Monday, July 13, 2009

Integrating an automatic build on a windows host with the rest of your one-command cross-platform build system can be quite a pain. Luckily there is cygwin which makes things like ssh, bash, make etc all work on windows. It's great.

The trick to building using visual studio from there is to use the devenv.com tool, which is a version of the full blown visual studio (devenv.exe) that does not pop up windows on your screen (ahem, should not- see below) but instead shows all output nicely on your terminal (which you tee to the logfiles of your build system of course). Life still looks good.

So you set up your build system to do remote, unattended logins to the windows build slave using ssh public keys. This is trivial as you do this on all your build slaves. But all of a sudden devenv.com just hangs. That's weird. Do some searching on the mighty Internet and it turns out there's a bug somewhere in Mircosofts authentication handling. There is some token that does something weird and somehow when you use public key authentication visual studio 2003 (and 2005) think they're the system service (since that is what the ssh service runs as) and don't run properly (various errors possible). But everyone reports errors instead of just hanging, how is this possible? So you go to the ssh service and tick the "Allow service to interact with desktop" box, restart ssh and try again. Success! Now you get a window from devenv.com that visual studio crashed on the screen. Never mind that devenv.com was supposed to be command-line only.

But still not closer to the solution. It turns out that Microsoft actually fixed this error for visual studio 2005, but you're stuck with 2003 for some reason.

Time for ugly solutions.

Since this Windows box is a build slave anyway, it's not supposed to be used for anything else. So the only account of use on it is the one used by the build system. What if we run the ssh service in the name of this account? This requires some fiddling with the permissions of the ssh files (/etc/ssh_* and /var/log/sshd.log) in cygwin, but once the ssh service is happy and wants to start it all works! No more errors from devenv.com

Ugly, but it finally works.

Importing modules in C extension modules

Sunday, July 05, 2009

It seems that if you need another module in a function of your extension module, the way modules in the standard library seem to solve this is like this:

static PyObject *
func(void)
{
    PyObject *foo;

    foo = PyImport_ImportModuleNoBlock("foo");
    if (foo == NULL)
        return NULL;
    /* do stuff with foo */
    Py_DECREF(foo);
    return something;
}

This means that you have to import the module each time you enter the function (yes, it's looked up in the modules dict by PyImport_ImportModuleNoBlock() but that function is only avaliable since 2.6, before you have to use PyImport_ImportModule()).

Personally I like storing the module in a static variable so that it only needs to be imported the first time:

static PyObject *
func(void)
{
    static PyObject *foo = NULL;

    if (foo == NULL) {
        foo = PyImport_ImportModuleNoBlock("foo");
        if (foo == NULL)
            return NULL;
    }
    /* do stuff with foo */
    return something;
}

Note here that the Py_DECREF() is gone. This function will effectively leak a reference to the module object. But is this really bad? How often do module objects get deleted in production code? My guess is that they normally stay loaded until the application exits.

Singletons in a Python extension module

Tuesday, June 23, 2009

If you want a singleton in a C extension module for CPython you basically have to do the same as when doing this in plain Python: the .__new__() method needs to return the same object each time it is called. Implementing this has a few catches though.

static PyObject *
MyType_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    static MyObject *self = NULL;

    if (self == NULL)
        self = (MyObject *)type->tp_alloc(type, 0);
    Py_XINCREF(self);
    return (PyObject *)self;
}

Then assign this function to the tp_new slot in the type. There's two things of interest in this function:

  • self is declared as static. Normally this would not be the case, but declaring it as static makes it stay alive after the function has returned and it will still be pointing to the first instance of our object, so the next time a new object is asked for this is simply returned.
  • Before returning the pointer to the static self, the reference count is increased. This may seem odd but is the right thing to do, otherwise the reference count will not go up for subsequent calls to the new function since the reference count is increased by PyType_GenericAlloc() (via the call from the pointer in the tp_alloc slot). So if you don't do this you end up with negative reference counts, which doesn't make python very happy. This does mean you never end up deallocating the object since the lowest reference count you have is 1, but you wanted a singleton right?. If you really wanted to get the reference count to drop to 0 then you can always put the Py_INCREF() in an else clause.

Voting and identification in the UK

Sunday, June 07, 2009

Last Thursday I had the pleasure of being able to vote for the local council as well as for the European Parliament. Since I'm a Belgian citizen living in the United Kingdom this involved convincing the officials at the voting office to re-read their instructions (at first they only allowed me to vote for the local council but not for Europe, I think they must have realised how silly that sounded) but otherwise was quite easy. Too easy.

When I went to the polling station I forgot my poll card, but my colleagues at work said that would be fine (not that going home to pick it up was very far) so I tried it anyway. No problem, the only thing they asked is where I lived, then found my name on their list and let me vote (after above-mentioned debacle). There was absolutely no verification that I was who I claimed to be. Seriously, it is rather trivial to vote for someone else, I'm sure you can visit 2 or 3 polling stations and vote in someone else their name. Just talk to friends and work a little bit together and you can all cast half a dozen votes if you're a little careful. And they have no way of recovering from this, other then having to let everyone using that voting station re-cast their vote.

After some talking to people they seemed to think that there is no single means of identifying someone in an official way in the UK. This means that buying alcohol is better controlled then voting, since there you either have a form of ID (if you look under 21) or you don't get to buy it. But for voting? No ID required, because there is none.

I'm sure the ID card scheme as currently proposed in the UK is not any good, but there does appear to be a problem that might have to be fixed somehow. By now I'm pretty sure that if you give me 2 years I can create a fictitious John Smith person, he'll have a passport, driving license, voting rights, bank account and be native British. Seriously, it's easy. You just need some time.

Raising exceptions in threads

Saturday, June 06, 2009

It's not simply raising an exception in a thread that I want. I want to raise an exception in a thread from another thread. It's like sending signals to threads, only signals in pyhon can only be delivered to the main thread (for portability). But Python has a other asynchronous signaling system: exceptions.

I'd like to be able to do something like:

for t in threading.enumerate():
    thread.raise(MyAppDoesntWantYouAnymoreError)

Is this possible? Are there other ways to do this sort of thing?

Alternatively I might be happy with a fix for issue1856, but I do think it might be nice to be able to signal threads in an asynchronous way in any case.

New home for PSI

Saturday, May 30, 2009

PSI, the Python System Information package that pulls interesting information from the kernel and makes it available in a nice Python API, has a new home at bitbucket. This means that the source conde now lives inside a mercurial repository instead of subversion.

This actually happened about a week ago, but better announce it late then never...

Python System Information 0.3b1^W0.3b1.1

Thursday, May 21, 2009

Short summary: PSI is alive and we've just released the first beta of a much improved upcoming 0.3 release!

Back in 2007 Chris Miles announced PSI - Python System Information, a Python extension module to provide access to some system information not normally available to Python. Most notably it allows you to look at all processes on the system and get details like memory usage, cpu usage, users and many more things the kernel knows about a process. And all this in a pythonic way!

At the time the implementation was not perfect (when will an implementation ever be?), it had many memory leaks and reference counting errors, and sadly no one seemed to have the time and motivation to work on it for a long time. But since the beginning of this year I finally found some time to fix these issues and soon some more people joined in. Chris has been amazing in allowing access to almost anything I asked for and since then there has been steady development improving the code base, tests and API. Right now PSI does not leak any memory, and provides basic system indentification and detailed process information on Linux (2.4 & 2.6 kernels), Darwin 10.3 and up, SunOS (5.8-5.11) and AIX (5.3) and it does this for any version of Python greater then 2.3, including 3.0. Not too shabby!

There are many things we would like to do in the future too. more platform support for one. Ideally PSI should run on all the major platforms supported by Python itself (and all minor ones too). And more information too, getting information about processes is one very useful and common thing but there is so much more the kernel can tell us: CPU information and statistics, network interfaces, etc. It's a massive and never ending task, but hopefully we can do the common things on all major platforms.

Update

Version 0.3b1.1 has been released now. Seems the last version did only ship the sources for Linux and not all platforms. This bugfix release also adds the MANIFEST file so that distutils can build binary distributions from the tarball.

Sorry!

Compiling applications in setup.py

Monday, May 11, 2009

If you need to compile a random application in setup.py this is not that hard:

cc = distutils.ccompiler.new_compiler()
distutils.sysconfig.customize_compiler(cc)
cc.link_executable(['test.c'], 'a.out')

There is no need to create the object files explicitly with cc.compile() first if you have no need for them, the command line invoked by .link_executable() will do the compilation step in one go for you.

This part of distutils is actually documented, so check it out, you can pass in many optional arguments and modify the compiler object to customize things.

What's new in Python 2.6: logging

Friday, May 01, 2009

The What's New in Python 2.6 document is very good and contains loads of information, congrats for making it. But it misses out a nice addition to the logging module: LoggerAdapter Objects. It allows you to create a logger object with extra contextual information really easy:

log = logging.LoggerAdapter(logging.getLogger("subsystem"),
                            {"argv1": sys.argv[1]})
logging.basicConfig(format="%(argv1)s %(message)s")

Neat

Secondly it decided to define __all__ which means not all object are exported anymore. Unfortunately this __all__ is incomplete in 2.6.2, even getLogger is not listed in it! The good news is that this is one of the fastest bug report turnaround times I have seen, so expect this to be fixed in the next stable release.

Sun Fire T1000 not closet friendly

Thursday, April 16, 2009

Obviously the Sun Fire T1000 is a noisy machine, that's why it's sitting in the closet in the first place. But it has more shortcomings that make it a closet-unfriendly server:

  • It shuts down at 50 degrees Celsius. That means wrapping it in a duvet to damp the noise is not a great idea if you want it to do any work.
  • The power button sticks out. That makes it way to easy to press it if you're getting something else from the closet.

Time, floating points and python modules handling time

Saturday, March 28, 2009

Some memory in the back of my mind tells I've once read a rant about storing time that argued you should never store time in floating point variables. My memory seems to think the rant convinced me and it does indeed seem better to store time in integers so that time doesn't go and change little bits due to rounding etc. and indeed the C library seems to stick to that, mostly (difftime() returns a double for example).

When looking at Python the stdlib datetime module seems to do this too. However ofter people scoff at the datetime module and recommend the use of mxDateTime instead, it is a lot better supposedly. But looking at how it stores time it seems to use double values interally.

So I am wondering, if mxDateTime gets away with storing time as floating points is there really a disadvantage to it? Is there a point to avoiding floating point numbers while handling time?

Reading a 64-bit core from a 32-bit proc on solaris?

Saturday, March 28, 2009

I'm trying to read some pointers from a 64-bit address space file (from /proc/pid/as) while running in a 32-bit process - all this on Solaris. To do this I'm using the transitional large file compilation environment, i.e. where you get both open()/pread() and open64()/pread64() since I still want to be able to read a 32-bit core. However when doing the pread64() call on the address space I keep getting EOVERFLOW, no matter what values I put in - even trying to read the first byte of the core. And reading the first byte of a 64-bit core using simply pread(), which as far as I can tell should work, fails too. EOVERFLOW whatever I do.

Is this simply Solaris stopping me from reading a core from a 64-bit proc? But why wouldn't it stop me at the time I do the open64() call in that case? Any clues would be appreciated.

Replacing the screen of your digital camera

Monday, March 23, 2009

The screen of my Canon Digital Ixus 40 compact camera broke on my last trip (sport climbing in Geyikbayiri near Antalya, Turkey - if you climb: it's an amzing place!) which made me rather sad. But with the help of Ebay I found a new screen and have just managed to replace it.

It only took about 1h40 of careful fiddling and figuring out how the camera was stuck together while being nervous of breaking it even more. But the end result was successful!

Virtual memory size (VSZ) on AIX

Saturday, February 28, 2009

If you have ever looked at the ps(1) output on an AIX box and wondered why the virtual memory size (VSZ) is so small, often even smaller then the resident set size (RSS), then here's the answer: it's not the virtual memory size. It is actually the size of the data section of the virtual memory of the process.

If you really do want to know the VSZ of a process you'll have to use svmon -P 123, there you will find it. But do multiply this by the pagesize (pagesize(1) is a user tool on AIX). If only somehow this would be documented.

Oh, and if someone knows where the source of svmon lives I'd love to know how they actually find that, using the struct procent64 structure from procinfo.h I can not figure out a way of finding it. And the VSZ information in /proc/123/psinfo is completely whacky, no idea what that is supposed to be.

For completeness sake: this is on AIX 5.3.

Closed source frustration

Thursday, February 26, 2009

Developing against a closed source libraries is jut painful and wastes your time. Function definitions are missing from the header files, so you get to declare them yourself from the manpages. Then you get random error messages even though you do everything just like the manpage tells you. Some dark corners of the internet then suggest one of the parameters is actually the 32-bit variant of the type instead of the generic one. Still no luck.

Just let me look inside the code to figure out why it's not working, thank you very much.

This post was brought to you by the joys of the getprocs64(3) call on AIX 5.3. Any hints on it's usage would be appreciated.

Update: It finally works!

ssh magic

Thursday, February 19, 2009

Dear Lazy Web

If I write the following in my ~/.ssh/config:

Host bartunnel
  HostKeyAlias bar
  HostName bar
  LocalForward 8822 foo:22

Host barjump
  HostKeyAlias bar
  HostName localhost
  Port 8822

Then I can connect to host bar via host foo (circumnavigating a firewall that stops me from going to bardirectly) just like am connecting to it directly. E.g. in two separate shells (in this order):

$ ssh bartunnel # this sets up the tunnel
# different shell (or use -n on the last one)
$ ssh barjump # now I'm connected normally

Now is there something I could write in my ssh configuration file that I could just do this in one step? I want to simply do:

$ ssh barjump

and the tunnel should be set up for me in the background. Likewise if I close the connection the tunnel should go. Is this possible?

Compiling 32-bit Python on amd64

Friday, February 13, 2009

If you ever feel the need to compile a 32-bit version of Python on a amd64 bit machine, this is how you do it on a Debian/Ubuntu system.

Firstly you need the correct compiler stuff, this means you need gcc-multilib and 32-bit development libraries of at least libc. On Debian/Ubuntu installing the gcc-multilib package will pull in most if not all of the required dependencies.

Next is invoking the configure script of Python. Sadly Python is one of those autoconf using projects who advertise the use of environment variables like CFLAGS in the --help output of ./configure but don't actually respect them, this is all too common for autoconf-but-no-automake using projects. So the correct way to start building is using the OPT environment variable instead of CFLAGS:

OPT=-m32 LDFLAGS=-m32 ./configure --prefix=/opt/pym32
make

You may want to finetune the OPT value, since this is where normally -g -O3 etc appears in so you've just got rid of those. I'm not quite convinced of this design but anyway.

Now you can watch the compilation, depending on your machine you may have time to make a cup of tea or so. Near the end you'll see something like this:

Failed to build these modules:
_hashlib           _sqlite3           _ssl
_tkinter           bz2                gdbm

This is pretty much exactly the same as when normally compiling python: go find the packages that provide the development libraries needed for these modules. Only this time you need to look for the 32-bit variety of these. On Debian/Ubuntu look out for packages named like lib32foo-dev. After installing all applicable ones I could find this is what I got it down too in the end (only using system packages):

Failed to build these modules:
_hashlib           _sqlite3           _ssl
_tkinter           gdbm

Just in case you aren't quite happy with your achievement so far you could now try compiling an extension module against your 32-bit python:

LDFLAGS=-m32 /opt/pym32/bin/python setup.py build

Now wasn't this useful? Silly binary-only libraries...

sed for binary files: bbe

Friday, February 13, 2009

While GNU sed takes care not to munge NULL characters this is not the POSIX standard, therefore it's no surprise some implementations don't manage to edit binary files properly (not to mention that many implementations will struggle with long lines). Hence the search for a binary sed:

bbe is just that. It allows you to do operations on blocks --you can define the block size in a variety of flexible ways-- as well as on bytes. So the most simple substitution command from sed just looks exactly the same. It's gorgeous.

It is a C program that you'll need to compile, but has no build dependencies so you can use it pretty much anywhere.

Specifying the RUNPATH in sofiles

Monday, February 09, 2009

When you create a shared object that depends on another shared object there are multiple ways to tell it where to look for the shared object. Two of these are by encoding (at linking time) a DT_RPATH and DT_RUNPATH entry in the .dynamic section. A third one is to use the LD_LIBRARY_PATH environment variable (at runtime). The order of precedence of these is:

  1. RPATH
  2. LD_LIBRARY_PATH
  3. RUNPATH

The question is how to encode these into your shared object. This is normally done with the --rpath or -R options to the linker. But as the name suggest this will create an RPATH. When using GNU ld(1) you add an --enable-new-dtags option which does add the (newer) RUNPATH, and when both RPATH and RUNPATH are present the runtime linker will ignore the RPATH. On Solaris however the linker does add a RUNPATH by default as soon as you use -R, thus you will always end up with LD_LIBRARY_PATH overriding the value you gave with -R. Good to keep that in mind.

Resistance to change

Saturday, January 31, 2009

Why can C developers be happy with using config.h to get constants about where to look for configuration or data files for example, yet Python developers seem to refuse to think of any way they might want to support finding these files in a portable way?

FreeBSD on Virtualbox

Saturday, January 31, 2009

Virtualbox is my desktop virtualisation technology of choice most of the time and I wanted to have a play with FreeBSD. Seems they don't get along tough and you won't get a network connection.

  • Solution 1: Change the network adaptor in Virtualbox to PCnet-PCI II (instead of PCnet-PCI III). I've tried this and it works.
  • Solution 2: I haven't tried this.

While we're on the subject, when running the Ubuntu 8.10 server jeos edition on Virtualbox don't forget to enable PAE in Virtualbox or you get to do a kernel dance using the resque mode of the installer.

datetime.datetime.totimestamp()

Tuesday, January 27, 2009

There exists a datetime.datetime.fromtimestamp() and many a time I've wondered why there is no .totimestamp() equivalent. The answer is that not all datetime objects can be represented by a timestamp. But if you know this restriction and want it anyway, this is the magic:

time.mktime(datetime_object.timetuple())

Would be so cool if the docs actually mentioned this.

PS: Both functions have a utc variant too

Update:

datetime_object.strftime('%s')

This solution does not give you sub-second resolution, but otherwise is rather elegant. Funny thing is that the %s specifier is not documented by the stdlib, but seems to exist on the underlying implementations at least on UNIX.

And for completeness, these issues are being discussed in the bug tracker. See issue2736 and issue1673409.

Resident Set Size (RSS) from /proc/pid/stat

Thursday, January 22, 2009

Most UNIX-like systems have information about processes stored in /proc/pid/ files, so does Linux.

If you would want to get the Resident Set Size (RSS) of a process on Linux you could find this in a number of files:

  • /proc/pid/stat: targetted for scanf(3)
  • /proc/pid/statm: targetted for scanf(3) but just memory information (more detailed)
  • /proc/pid/status: targetted at humans

Oddly enough if you check these three files for the RSS memory you will get different results! It seem both stat and statm have the wrong RSS information. It is a mystery to me why.

Porting a C extension module to py3k

Thursday, January 15, 2009

This was not that hard! First look at what everyone else has to say about this, nice aggregated on the python wiki. It covers a lot.

Now for what I discovered on top of this:

  • When you're defining a new type self->ob_type doesn't exist anymore. This is a problem as you need it in the deallocator for example. The solution is to use Py_TYPE(self). So the deallocator becomes: Py_TYPE(self)->tp_free((PyObject*)self);
  • When you're paranoid and use -Werror -Wredundant-decls you'll notice a duplicate declaration in pyerror.h. Bug filed.
  • The module initialisation is a lot easier then in the official docs. You seem perfectly fine without module state, so all you need to fill out in your struct PyModuleDef is m_base, m_name and m_size. Of course m_doc and m_methods are pretty useful too, but not strictly required. Copy the tutorial here.
    And if you use PyMODINIT_FUNC to declare it all you need to #ifdef is PyInitmymodule(void), PyModule_Create() and the return value.

Generating source files in setup.py

Friday, January 09, 2009

I have a Python extension module written in C and one of the weirder things it needs to do is have a number of module constants with their values which are defined (e.g. #define PREFIX_FOO 0x01 etc.) in an external C header file. All the defined names in that header file start with a common prefix so it's easy enough to write a python function that will read the file and spit out the correct C source code that enables me to expose these in my python module. The tricky part however is where to hook this up in the setup.py script.

At first I tried to extend the distutils.command.build_ext.build_ext class to generate this file. That doesn't work however as the distribution is not very happy about having a file listed as required (Extension(sources=['generated_file.c', ...], ...)) which isn't actually there at the time the distutils.dist.Distribution instance is created.

So the two (AFAIK) remaining options are to subclass distutils.dist.Distribution and pass into the setup method: setup(distclass=MyDistribution, ...). Or secondly, don't add the generated source file to the list of required files and created in the extended build_ext command.

For now I've gone for the last option as it seems the more appropriate place to do things (I've overwritten the .initialize_options() method). But I wonder if other more elegant solutions exist?

Subscribe to: Posts (Atom)