devork

E pur si muove

Seeing your browser's headers using python

Saturday, December 15, 2007

There's a very annoying website that won't send their CSS to my normal web browser (epiphany) which makes it rather ugly. However when I use iceweasel the CSS gets applied. Since both browsers use exactly the same rendering engine, gecko, on my machine as far as I know, I thought they must sniff the headers sent by my browser. So I needed to check the headers, Python to the rescue:

import BaseHTTPServer

class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()
        self.wfile.write('<html><body><pre>')
        self.wfile.write(self.headers)
        self.wfile.write('</pre></body></html>')
        return

def main():
    try:
        server = BaseHTTPServer.HTTPServer(('', 80), MyHandler)
        print 'serving...'
        server.serve_forever()
    except KeyboardInterrupt:
        print 'ttfn'
        server.socket.close()

if __name__ == '__main__':
    main()

Running this as root (80 is a privileged port) will show you the headers sent by your browser to the server. It's so simple that it took me longer to write this post then to write that code.

Only a pity that it didn't help me solve my problem...

Making Debian packages on an SMP machine

Monday, November 26, 2007

When you have an SMP machine (including dual core CPUs - but I'm sure everyone knows that by now), you quickly learn to use the -jN flag to GNU make(1). N is the number of CPUs you have and it lets make(1) run that many jobs in parallel whenever possible, thus using all your CPUs and giving you a nice speed benefit.

However when creating a Debian package you can't always specify this when you use a tool like dpkg-buildpackage(1), debuild(1) or svn-buildpackage(1). A simple trick is to just invoke these tools with the environment variable MAKE set to make -jN. Now whenever make(1) will invoke a sub-make, it will use the -jN parameter for it and since debian/rules is the first make invocation, all your actual compilation will always happen with multiple jobs in parallel.

This is not perfect however, all sub-makes will be called with -j too now, so you'll get more jobs then CPUs. I tried using MAKEFLAGS but that didn't quite work out, if someone knows of a better solution let me know. But this one works anyway.

How far does encryption get you?

Wednesday, November 21, 2007

Finally, since a short while, all my hard disks are setup to use encrypted disks using LUKS. Works like a charm and you don't really notice it slow down. Unfortunately I am now left wondering what the point is, given that I currently live in the UK. Not that there is anything exciting on my hard disks, but I'm one of these people who care about principles regarding freedom.

One possible thing to do is create a key on a USB stick instead of using a password. It would mean I can destroy it when they ask for it as it seems they have to ask you by letter first. Although I can't imagine a situation where I'd rather lose all my data myself too then have the police poke their nose in it.

At some level I can't help but compare them (the UK government) with the music and video industry: create new (broken) laws as they are unwilling to adapt to innovations.

Simple music players

Monday, November 12, 2007

A while ago I bragged about my own music player. Basically I couldn't find the player I liked so started creating my own. It was the first time I wrote anything related to media or a GUI -good thing my basic requirement was simplicity- and was interesting to do, but I never found the code amazing.

A little later, after using my own app with pride and also implementing a few features incredibly inefficient, I discovered Decibel Audio Player. The code is way more advanced then mine and it's a lot better. It has most of the things I did want in my player eventually already and yet is still very simple. Definitely recommended, I'm not going to bother anymore improving my own creation.

PS: did I mention that, just like mine was, Decibel is also written in Python?

There's a first time for everying

Thursday, September 27, 2007
flub@signy:~/src$ tar -xzf storm-0.10.tar.gz 

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
flub@signy:~/src$ file storm-0.10.tar.gz 
storm-0.10.tar.gz: POSIX tar archive (GNU)
flub@signy:~/src$ mv storm-0.10.tar.gz storm-0.10.tar
flub@signy:~/src$ tar -xf storm-0.10.tar
flub@signy:~/src$

I don't think I've ever had an incorrectly compressed tarball before. Storm offers the download in .tar.bz2 and .tar.gz, but at least the .tar.gz is not compressed at all (not that it is necessary with 804kB) and I didn't try the other one.

It is just a fairly bad first impression...

Update: It seems the download file is actually correct as pointed out in the comments. It's not just a simple human error, it's a most interesting little bug of launchpad.

EU in censor madness

Monday, September 17, 2007

So the EU seems to think that the Internet should be filtered. The first step is blocking searches for bomb recipes.

To any rational person it is obvious that this is wrong on more levels then I care to explain (others have done so already better then I can), but yet the madness is able to spread this far. And here was me hoping the EU would be better then this.

What worries me most is that I have no idea how my vote could influence this sort of thing. Which parties or persons will actively try to stop brain-dead stuff like this? I just don't know the answer. Even if there is an answer I'm pretty sure it will be some minority party that will have a hard time making it into a decent coalition. I can understand people get fed up with our implementation of democracy and do things like voting no.

Capable Impressive System Crippled OS

Friday, September 14, 2007

That's my new acronym for cisco. Seriously, I did expect a learning curve for IOS when I was ordering a cisco 877W, but this is just disappointing.

No doubt IOS was once very amazing and had a very nice and clear concept and interface to the first few products it was made for. But right now it just looks like something that has been patched too many times to keep up with current technology and it seems like it has lost it's concept/religion/design/architecture a while ago. And to top it off the documentation has exactly the same problems.

It just makes me sad, as I know the thing is very capable and is an amazing piece of hardware. Hasn't anyone ported a linux or minix to it? That would be cool. Maybe I should have bought something that could run openWRT or so instead, oh well - too late.

Source code to my music player

Saturday, August 25, 2007

Recently I said I created an increadably stupid, dump and boring GTK music player. I also offered to make the source code available in the very unlikely case someone else would find this thing interesting. I should have known to just do that right away, so here it is.

My very own music player

Thursday, August 23, 2007

Yes, I've contributed to the masses of music players out there. It is pleasantly easy using python and the pygtk and pygst modules.

But why yet an other music player? Well it is, intentionally, increadably simple.

  • No library. Just a simple list of files, played sequentially.
  • No metadata. If you just want to play a few random files, why would you care about the metadata?
  • Simple interface. It's very boring but fits in with GTK or GNOME app quite nicely.
  • No bells or wistles. Seriously I want to listen to a few files, not have a visualisation, equaliser, unreadable skin or OMG such a cool gadget etc.

Basically, anything that the player doesn't do is done better by other applications (for me that is Ex Falso / Quod Libet).

Here the obligatory screen shot, so you can see it's boringness in all it's glory. Realy, there is nothing more about it then you can see.

This doesn't mean there is no room for improvement obviously. Mostly usability. It could do with a slider, pause functionality, a column with file duration, a context menu, delete-key binding, ... There is an endless list (writing GUI apps is so troublesome!). Just functionality wise it will stay very limited.

On the off chance that someone is also interested in this thing, just tell me. I'll gladly make the source available under some free license, it's rather tiny right now with one glade file and one python file of 270 lines of code. I just don't expect someone else will be interested in this ;-).

PS: Yes, that screenshot contains a typo, some things you just don't notice until you create a screenshot...

Encrypted hard disk: LVM on LUKS

Sunday, August 19, 2007

There are of course about a million other guides on how to use encrypted disks out there. However I did run into some trouble when trying this, so here mine. Specificly I address the issue of getting an encrypted root disk, with the root being on a Logical Volume Management (LVM) as most other guides only seem to describe how to setup a random disk/partition encrypted. I'm not going to duplicate the other guides too much, so read a reasonable one (like this one) first. The last special case is that I copy all the data accross once the disk is ready, otherwise I could have just used the debian-installer which does a great job.

This entire operation was done while being booted from a grml life cd, with the backups of the data on a USB disk

Firstly you need to partition your disk. Create 2 partitions, one small one for /boot which will stay unencrypted, and the other as large as you fancy. The boot partition should be a normal Linux partition (0x83) while the other one I did set to Linux LVM (0x8e), but I don't think that matters. The boot partition is simple: format it (e.g. mkfs -t ext3 /dev/hda1) and copy the data on it. The other partition is going to be a LUKS volume, on which we will create an LVM Physical Volume (PV) with a Volume Group (VG) on with several Logical Volumes (LV), say, / and /home. Let's do this:

~# cryptsetup luksFormat /dev/hda2
<asks for password>
~# cryptsetup luksOpen /dev/hda2 luksvolume
<asks for password>

The luksvolume part is the name of the volume for the device mapper, the disk will now appear in /dev/mapper/luksvolume. Great! Let's create our LVM setup on it:

~# pvcreate /dev/mapper/luksvolume
~# vgcreate mygroup /dev/mapper/luksvolume
~# lvcreate -L 10G -n root mygroup
~# lvcreate -L 10G -n home mygroup

The volumes are now available as /dev/mapper/mygroup-root and /dev/mapper/mygroup-home or via the symlinks /dev/mygroup/root and /dev/mygroup/home. Again, create your favourite filesystems on it and copy the data accross.

We're almost there, but not quite. The disk needs to be bootable, so mount the root partition somewhere and mount the boot partition inside it, then install grub on it: grub-install --root-directory=/mnt/newroot, time to double check /mnt/newroot/boot/grub/menu.lst and make sure all is fine in there.

Now make sure the encrypted disk will work when booting. For the following it is easiest to chroot /mnt/newroot as the command doesn't deal with alternative roots yet. So in the chroot write the /etc/crypttab:

# <target name> <source device> <key file> <options>
luksvolume      /dev/hda2       none       luks

Hopefully one day that would be enough, currently this was completely irelevant in this setup however (this file is only relevant for non-root encrypted disks currently). So you need to create another file, /etc/initramfs-tools/conf.d/cryptroot:

target=luksvolume,source=/dev/sda2,lvm=mygroup-root

Now recreate the initrd using update-initramfs -u and you should be all set. Get out of the chroot and boot the disk.

This should work on both Debian and Ubuntu, however when you're using Ubuntu you may get some funny results when it needs the password while usplash is running. It will quit usplash but not tell you it is waiting for a password, check out this bug report for some possible solutions.

Writing entropy

Saturday, August 18, 2007

Q: How long does it take to write a 120 GB disk full with random data?

A: 15 hours, 2 minutes and 27 seconds.

Obviously this depends on the machine. For me it was going at just under 8 minutes per GB, others report around 5 minutes per GB. Also this was using /dev/urandom as input for dd, wich is obviously not really random. I don't even want to think about how long it would take using /dev/random.

Debian is 14

Thursday, August 16, 2007

Happy brithday!

And thanks for everyone involved all those years.

Gazpacho and libglade

Monday, August 13, 2007

If you read my last post about using Gazpacho you should read the comment by Johan Dahlin too. He's one of the authors of Gazpacho and explains the libglade and save format issues^Wthings in Gazpacho nicely.

Making GUI applications with Gazpacho

Sunday, August 12, 2007

Earlier I have made some simple GUI applications using PyGTK and Glade, which is surprisingly easy. Now I have another small itch to scratch and have another go at some GUI app. Only this time I decided that the coolness of Gazpacho looked slightly nicer to me, so gave that a try.

Creating the UI is easy enough and after some messing around I had something that would do. Gazpacho claims to create glade files compatible with Libglade, so I just went about what I did last time:

import gtk
import gtk.glade


class MainWindow:
    def __init__(self):
        self.wtree = gtk.glade.XML('ddmp.glade')
        self.wtree.signal_autoconnect(self)

    def on_mainwindow_destroy(self, *args):
        gtk.main_quit()

    def main(self):
        gtk.main()


if __name__ == '__main__':
    app = MainWindow()
    app.main()

However this didn't quite work, I got libglade warnings about unexpected element <ui> and unkown attribute constructor. Furthermore gtk.glade gave tracebacks about assertions of GTK_IS_WIDGET. After a quick search on the great internet that didn't result in anything (I was wondering if my libglade was too old or so) I had a look at the examples supplied (stupid me, why would I not look there first?) and sure enough, they don't use gtk.glade. So the above code changes into:

import gtk
from gazpacho.loader.loader import ObjectBuilder


class MainWindow:
    def __init__(self):
        self.wtree = ObjectBuilder('ddmp.glade')
        self.wtree.signal_autoconnect(self)

    def on_mainwindow_destroy(self, *args):
        gtk.main_quit()

    def main(self):
        mainwindow = self.wtree.get_widget('mainwindow')
        mainwindow.show()
        gtk.main()


if __name__ == '__main__':
    app = MainWindow()
    app.main()

So Gazpacho needs a different loader for the XML, the returned object appears to be behaving as the gtk.glade.XML widget tree which is nice (since gazpacho documentation seems to be non-existing). I suppose libglade doesn't cope with the gtk.UIManager code created by gazpacho yet (the FAQ seems to suggest there are patches pending) and that their custom loader translates it to something libglade understands. This does make me wonder if you can use gazpacho when using any other language then python, the examples only contained python code. Surely they'll want to support any language that has libglade?
Lastly it seems to hide windows by default, which I actually quite like. I remember in glade you had to explicitly hide dialog windows or they would show up at startup, this seems slightly more logical.

Overall I do quite like gazpacho so far, I'm glad I chose it and would recommend it. It still has some rough edges but is very nice already.

Optimising python code by AST fiddling

Monday, August 06, 2007

Obviously no real optimisation, just optimisation exactly like python -O [-O] would create. But in "normal" byte compiled files, i.e. in .pyc files instead of .pyo files.

Why would I want to do that? Well, we really don't feel like shipping code with assert statements or if __debug__: ... bits in. They are only useful during development and should not appear in shipped code. And while we're at it stripping the docstrings can't hurt either. Still no reason not to just use .pyo files though, but there are if the code also needs to run as a windows service. The Python for Windows Extensions provide an excellent framework for making your python code behave like a service, unfortunately it does not seem to support optimised code. And I only found an old (but interesting) email thread discussing this, but other then that no one seems to talk about these issues. So I started thinking if we could just modify our code during the build to strip out all the things we don't want in it, we effectively have .pyo code inside a .pyc. Try it for yourself:

$ echo pass > test.py
$ python -m py_compile test.py
$ python -OO -m py_compile test.py
$ cmp test.pyc test.pyo && echo equal || echo unequal
equal

It appears to me that modifying the code would be as sane as trying to get some of the things to work suggested in that thread, and in my eyes it seems cleaner for the moment. Python 2.5 comes with a parser module that allows you to parse python source code into Abstract Syntax Trees (ASTs), once you have the AST objects you can convert them to lists and tuples and then convert them back to an AST (here you get the opportunity to change the list form of the AST). Lastly it provides functions to compile these ASTs into a code object just like the builtin compile() function does. From there it is not far to creating a .pyc file, the py_compile module shows us that with the help of imp and marshal modules this is only a few lines of code.

Writing all of this down and looking at the py_compile code made me realise that the same might actually be achieved by simply renaming the .pyo files to .pyc files! Or there could be something in the marshal module that behaves differently when running optimised. I'll have to try that out.

All of this, however, raises the point mentioned in that python-dev thread linked above, why bother telling how optimised the compiled code is in the file extension? Guido's idea of storing which optimisation has been done in the .pyc is not bad - although I personally don't like the automatic writing of .pyc files, but that's another discussion. I'm not sure if the extra bytecode that Brett and Philip propose later is that great though, personally I'd get rid of -O completely and just run the .pyc if it's the same age as the .py and if it's not don't bother re-creating it - modules are mostly compiled on installation anyway (except for developers).

So, am I going insane? Or is there really no need for the behaviour with -O and .pyo? If the optimisations can be prefomed by some AST transformations anyway, then I think the (mostly annoying) -O behaviour is obsolete.

(And not writing .pyc files by default is an other story, in the mean time the .pyc files can be re-created using the same optimisations as the old one was created with. Or with the "current settings" of the python interpreter (which would mean none as I want to drop -O), I don't think that's too important right now.)

Update: Obviously the magic number in .pyc files differs from the one in .pyo files, so just renaming the files won't work. I should have know...

How to choose a web framework?

Wednesday, August 01, 2007

So I have a bit of organically growing information that is nice to represent on internal web pages in a semi-organised fashion. The first iteration just runs a few python scripts from some cron jobs that utilise SimpleTAL to create staticly served pages. SimpleTAL as templating language simply because we also use roundup as issue tracker and keeping the templating language the same seems like a sensible thing to do for a multitude of reasons. But as the number of pages grows and more and more features wanting cgi, e.g. to trigger a software build instantaniously instead of waiting for the next nightly cron job, creep up this seems like the right point to move to a proper framework that will make maintenance, organisation and code reuse a lot easier.

Only, how to choose? It seems to me that the current Python web framework world changes drastically every 6 months. And really, that's just plain annoying. I don't want to worry about an upgrade of the codebase in a few months time, whether because there's a newer version of the framework or it's simply not the hip way to do stuff in the flasy web 2.0 world and hence is left to die.

Django has been in the 0.9x releases for as long as I can remember, every time I looked as some of the docs they seemed to say "Don't use the lastest release as the svn repo is a lot cooler." Not very inspiring. It also seems a pretty steep jump from the so very light weight infrastucture currently in use, it is a big (but seemingly beautiful I admit) beast.

The turbogears approach of taking best of breed applications has always attracted me. Alas that does not seem to go hand in hand with the requirement to remain stable as the 2.0 announcements seem to prove. Although they do seem to promise 1.x continuity and easy 2.0 upgrade path.

I have formed no thoughts yet about pylons other then that it can't be that bad as turbogears 2 is going to use it, but I can say that paste has a nice paragraph right on the first page:

There's really no advantage to putting new development or major rewrites in Paste, as opposed to putting them in new packages. Because of this it is planned that major new development will happen outside of Paste. This makes Paste a very stable and conservative piece of infrastructure for building on. This is the intention.

Maybe I should check out their do-it-youself-framework...

Finally, did I mention zope just has way too much overhead? (I considered using a short script to compare all the dependencies of these packages in Debian, but it's too late at night...)

So, who in this web2.0-happy world is a stable building block for your web requirements? Seems a scary world.

__slots__: worth the trouble?

Wednesday, May 30, 2007

I find __slots__ annoying. They seem to have a lot of rules about their use and make things like a borgs rather awkward to use (I even prefer to use a singleton instead of a borg in new-style classes because of this).

And all it does is safe some space in memory (AFAIK). Make me wonder if it's worth it. Google claims there are about 2k hits for __slots__. Compare to e.g. about 300k hits when doing the same search for __init__.

Maybe I'm just overreacting though. Maybe having all class data in one place is not that important. Or maybe there are just massive gains in using __slots__ for some people that do make it worth it. But I'm not one of them so hence my rant.

IPC in the stdlib(?)

Thursday, May 17, 2007

Many interesting comments on my last musings about IPC. Each and every one of them look very nice and have their application area, go and have a look! These found a weak spot in my heart though:

  • Candygram is Erlang style message passing. Very stunning, but to my liking a little bit too much Erlang-like. Nevertheless I was impressed.
  • The delegate module was probably the closest to what I was describing. And I'm surprised it's not better known.
  • Lastly I can't keep my mind off pylinda. From the moment I started to read about it I fell in love with it. It is Erlang-like enough to satisfy me and also has a huge potential even to talk to other languages. I'm not sure yet how it's API looks as I'd want to do things like running the server as part of the main process (and it being a private server as well then) etc. It got me excited allright.

All this is well though, but I'm not sure I made my main point very clear last time (and fear losing it here in the noise too, I must be a bad writer). Namely, non of this is in the stdlib (yes, xmlrpc is).

I do think there is a need for a good and yet simple pythonesque IPC module in the stdlib. When building cluster or grid systems it seems reasonable to me to stray outside of the stdlib (for now at least). But for simple SMP a module should be making it's way into the stdlib by now IMHO.

Given by what I've seen so far -but beware, I haven't used any of these yet- delegate seems like the best suited candidate. It may need some updating, I can imagine it benefitting from the subprocess module for example, but I certainly think something like it should live in the stdlib.

I wonder what the chances of that happening are.

IPC and the GIL

Wednesday, May 09, 2007

As recently described excelently, threads are ugly beasts, waiting to get you when you let your guard down (and eventually you will). Yes, that means that I should really get my head round the asynchat module and stop using ThreadingMixIns with the SocketServer, but that's not the point of this post.

Inter process communication, aka IPC, is way safer and scalable when you want to distribute the work between many processors. But what bothers me is the complete lack of a nice support for this in the stdlib, while there is an excellent threading module.

So what options are there?

You could stay inside the stdlib and construct something out of pickle and sockets, but that isn't amazing as you'll have to write a lot of boiler plate code that will have it's own bugs. Leaving the stdlib you could probably replace pickle with json, but appart from being able to talk with non-python code you're not really better off.

There is of course the excellent omniORBpy if you grok CORBA. But the python mapping of CORBA was written by C++ programmers, so despite being the best CORBA mapping available it is still not the most lovely thing to work with as python developer. And CORBA is a rather heavy weight approach that is far from applicable in all situations. XML-RPC is another internet-aware IPC option that even has support in the stdlib, but once more the overhead when you're just on a local machine and want to use 2 CPUs is silly. Twisted must surely have some solution too, but I don't really know that and again I think it will be rather heavy-weight. Lastly I feel obliged to mention SOAP in this paragraph too, but can't help to shrudder when thinking about it.

I guess what I'm looking for is some simple, light weight and scalable module that does something similar to how Erlang passes messages around. Maybe all I'm asking for is a module that ties up subprocess and pickle and lives under the IPC section in the stdlib. But it would be nice if it could also transparantly stream the data accross a socket so you can also run on multiple hosts if you want (maybe in a separate module though). Having a module like that around would obsolete the need for boilerplate code as well as establishing some sort of "best practice" IPC.

Anyways, I think this is more like me wondering what people use or would want to have for their IPC in python. Anyone?

What's in a number?

Wednesday, May 02, 2007

Ok, I coudn't resist this. Someone told me (as much as writing a blog counts as telling) this is my lucky number: 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0. Someone else says it's magic. Lucky and/or magic numbers... hmm, sounds like in a fairytale.

More roundup love

Tuesday, May 01, 2007

As I've said before, roundup is an amazing issue tracker. Last time I wrote I had just implemented our internal bug tracker with all the weird requirements of a few developers. By now the incredibly flexible system, thanks to python, has been moulded into our complete project management tracker.

This second instance tracks our development effort from customer wishes through to specifications, development and passing of acceptance criterias. All with the help of a few (three actually) IssueClasses and some funky relationships between them with the help of python functions as detectors. The last part of the equation is our good old wiki (moinmoin if you where wondering).

PS: No it didn't take as long as these two posts about roundup are appart. For one I did lots of other stuff in between. And secondly I've been meaning to tell people how flexible roundup is for a while now.

Campaigning for -delete

Tuesday, May 01, 2007

Whenever find(1) is used to delete some files there are always a few variants that can be seen. They are generally combinations of -exec and -print piped to xargs. While demonstrating the increadible power of UNIX shells and pipes etc they don't appear very practical.

Most implementations struggle with embedded spaces, newlines, quotes etc. So even more exciting things get produced to cope with all these weird things. So why not simply use -delete at the end of the desired expression? Is it because it is a GNU extension? I swear that most of the expressions that actually try to cope with weird filenames depend on GNU extensions anyway. Not to mention that most of the scripts know they will have GNU find and don't behave scrictly POSIX compliant anyway.

So why is it that in no example -delete is used? There's no point as far as I can tell.

In light of that this post is my campain for the more frequent use of -delete in find(1) expressions. I'm sure you won't care. Neither will I; by tomorrow I'll have forgotten about this campain. But at least I'll be using -delete myself.

Debian Developers working together

Sunday, March 25, 2007
My quote of the day (from Josselin Mouette):
BTW, who said flamewars are a nuisance to the Debian project? I've never seen developers as united as when Mr Schilling is around. With a few more people like him we would probably be always working together.

Backup script in 43 lines

Saturday, March 24, 2007

This is no more then my personal backup prefference at this moment. It is full of personal preferences and might not be what other people like. It also isn't written with portability in mind. But I think it is rather elegant.

My use case: on Friday evening I get home, switch on my laptop and boot into single user mode. I attach an extrenal hard drive to it and power that on too. Once logged in (as root, we're still in single user mode) I type mkbackup.sh && shutdown -h +5. When I get up in the morning all should be fine i.e. the laptop is switched off, but if it's not the laptop will still be running and I can investigate what went wrong.

Here the script:

#! /bin/bash

set -e -x


#### CONFIGURATION ITEMS ####
# The UUID of the filesystem the backup lives on.
UUID=5ffcbedc-f49d-4e17-8b07-88f6b62247f6
# Subdirectory on the backup media to store backup in.
SUBDIR=backups
#### END CONFIGURATION ITEMS ####


# Read which filesystems to backup from /etc/fstab.
SFS=''
while read filesystem mountpoint type options dump pass; do
        grep -qs '^$' <<<$filesystem && continue
        grep -qs '^ *#' <<<$filesystem && continue
        test $dump -eq 1 && SFS="$SFS $mountpoint"
done < /etc/fstab

# Mount backup media when needed.
do_unmount=false
if ! grep -qs /dev/disk/by-uuid/$UUID /proc/mounts; then
        mount -U $UUID
        do_unmount=true
fi

# Prepare file and directory variables.
dir_prefix=$(grep -s /dev/disk/by-uuid/$UUID | cut -d ' ' -f 2)
ddir=$dir_prefix/$SUBDIR
file_prefix=$(date --utc --rfc-3339=date)

# Do the backup.
for $fs in $SFS; do
        name=tr -d '/' <<<$fs
        find $fs -xdev | afio -o -Z $ddir/$file_prefix_$name.afio.Z
done

# Unmount backup media when we did mount it.
if $do_unmount; then
        umount $dir_prefix
fi

For completeness I should be showing my /etc/fstab too:

# /etc/fstab: static file system information.
#
# <file system>  <mount point> <type> <options>                    <dump> <pass>
## Virtual filesystems:
proc             /proc         proc   defaults                         0      0

## Local filesystems:
/dev/hda1        /boot         ext3   defaults                         1      2
/dev/signy/swap0 none          swap   none                             0      0
/dev/signy/root  /             ext3   defaults,errors=remount-ro       1      1
/dev/signy/home  /home         ext3   defaults                         1      2

## Removable media:
/dev/hdc         /media/cdrom0 udf,iso9660 user,noauto                 0      0
UUID=5ffcbedc-f49d-4e17-8b07-88f6b62247f6 /media/icybox ext3 noauto    0      0

## Chroot stuff
#proc         /chroots/sid/proc        proc   none                     0      0
#/home/flub   /chroots/sid/home/flub   ext3   bind                     0      0

The only thing that I'd like to figure out is how to switch off that external hard drive too, some sort of syscall I assume.

Timers for profilers

Sunday, February 18, 2007

After some oh... and ah... experiences I have decided that the only timer that will give you correct results in a portable manner (accross POSIX platforms that is) is resource.getrusage(resource.RUSAGE_SELF) and os.times(). This is rather shocking, certainly after I then went on to discover that on the only profiler doing this is the original profile module, and even then only since revision 38547 (before it defaulted to time.clock() which is not too bad --see below). So what's wrong with the other timers used:

  • time.clock() [profile]: This should do The Right Thing(tm) according to POSIX but unfortunately some systems decide to include the time of the children and won't tell you they're doing this (see the GNU manpage for clock(3)).
  • gettimeofday() [hotshot, _lsprof/cProfile]: This is the system call gettimeofday(2) and not available from within python. Problem here is that we're using multitasking systems and the OS can decide to run an other process at any time while you are profiling.

I'm sincerely hoping that someone is going to point out to me how wrong I am. If no one does that I will feel morally obliged to create some patches.

Profiling and Threading

The reason that I found this out is that I would like to profile some code that is running threads. As it stands I can not find any useful code for that. There are a few interesting bits around however that I could hack into something very ugly that might just work.

To start there is the threading.setprofile() call as well as the sys.setprofile() call (I know this may sound obvious, but I didn't know about the former before). However this doesn't help you at all as the profiler might get in and out of the thread between two of your profiler calls (which, if you where using hotshot or _lsprof/cProfile wouldn't matter as you'd have that problem anyway ;-P).

Randomly trying to find out more about threading I search for the sys.setcheckinterval() to find what exactly it does. Intead of finding a nice discription about what it does (appart from apparently switching threads) I find a crude hack for making sure that some code gets executed atomicaly (i.e. no thread switching happens during it): call sys.setcheckinterval(sys.maxint) just before it and restore it afterwards.

So my vague plan for a terrible hack is change the profile module to do exactly that just before the profiler callback returns (actually before the profiler takes the time just before it returns). Then when the profiler callback is entered (and just after this has taken the time) do a sys.setcheckinterval(-1) so that I can be certain python will switch to other threads when needed.

If this does end up working it will be terribly inefficient, but it's worth a shot I recon. At least would be possible to profile multithreaded code in some meaningful way.

Schizophrenic thoughts

Thursday, February 15, 2007

On one hand I use tab-completion on the command line all the time. I even go as far as creating my own -terribly incomplete- completion functions in bash for commands that I use often

On the other hand, when programming I don't use tab-completion -maybe mostly because it's not so easy to get in my preferred editor. But in this case my mind is going to argue that if you need tab completion to know your variables you're screwed anyway and should redesign. Although, somewhere my mind seems to acknowledge that tab-completion for an API would be useful. Indeed, I always have an ipython session nearby to play with the API and read docstrings.

Maybe I should take the time once to sort out my emacs so it can do all of that too.

Confusing

Roundup praise

Wednesday, February 14, 2007

Roundup is just an amazing bug tracking system. It really is way more general then that, their own words for it are not bad: an issue tracking system for knowledge workers.

Why I love it:

  • It's written in python
  • It has a poweful and flexible concept/abstraction of a database very well suited for it's purpose.
  • It has a very flexible page rendering and templating system. With clever URL crafting you can modify how and what the page displays.
  • It's configuration is done by creating python objects (modifying the database layout) and writing python functions (automating and policing changes to the issues).

And lots lots more...

Obviously nothing is perfect and it still has a few minor bugs and annoyances. But this is just so much nicer and more flexible then say bugzilla.

Writing applications as modules

Wednesday, February 14, 2007

The Problem

I recently had to write a few command line applications of the form "command [options] args" that did some stuff, maybe printed a few things on screen and exited with a certain exit code. Nothing weird here.

These apps where part of a larger server system however and needed to use some of the modules from these servers for some of their work (in the name of code reuse obviously). A little later these apps would look nicer when they are separated out into their own modules as well (all hail code reuse again the apps can share code) and now it is really a short step to wanting to use some of the more general modules of the apps in the server.

I'm not sure that last step was very important, I think it all started when the app was split up in modules. But the last one made it very obvious: you can't just print random stuff to the user and decide to sys.exit() the thing anywhere you want. You want the code to behave like real modules: throw exceptions and not print anything on the terminal. That's not all, you also want to write unit tests for every bit of code too. Ultimately you need one main routine and you want to test that too, so even that can't exit the program.

The Solution

Executable Wrapper

The untestable code needs to remain to an absolute minimum. Code is untestable (ok, there are work arounds) when it sys.exit()s so I raise exceptions instead. I defined exceptions as such:

class Exit(Exception):
   def __init__(self, status):
       self.status = status

   def __str__(self):
       return 'Exit with status: %d' % self.status

class ExitSucess(Exit):
   def __init__(self):
       Exit.__init__(self, 0)

class ExitFailure(Exit):
   def __init__(self):
       Exit.__init__(self, 1)

This allows for a very small executable wrapper:

#!/usr/bin/env python

import sys
from mypackage.apps import myapp

try:
        myapp.main()
except myapp.Exit, e:
        sys.exit(e.status)
except Exception, e:
        sys.stderr.write('INTERNAL ERROR: ' + str(e) + '\n')
        sys.exit(1)

The last detail is having main() defined as def mypackage.myapp.main(args=sys.argv) for testability, but that's really natural.

Messages for the user

These fall broadly in two categories: (1) short warning messages and (2) printing output. The second type is easily limited to a few very simple functions that do little more then just a few print statements, help() is an obvious example. For the first there is the logging module. In our case the logging module is used almost everywhere in the server code anyway, but even if it isn't it is a convenient way to be able to silence the logging. It's default behaviour is actually rather useful for an application, all that's needed is something like:

import logging

logging.basicConfig(format='%(levelname)s: %(message)s')

The lovely thing about this that you get --verbose or --quiet almost for free.

Mixing it together

This one handles fatal problems the program detects. You could just do a logging.error(msg) followed by a raise ExitFailure. But this just doesn't look very nice, certainly not outside the main app module (mypackages.apps.myapp in this case). But a second option is to do something like

 raise MyFatalError, 'message to user'

And have inside the main() another big try...except block:

try:
        workhorse(args)
except FatalError, e:
        sys.stderr.write('ERROR: ' + str(e) + '\n')
        raise ExitFailure

Just make sure FatalError is the superclass of all your fatal exceptions and that they all have a decent __str__() method. The reason I like this is that it helps keeping fatal error messages consistent wherever you use them in the app, as all the work is done inside the __str__() methods.

One final note; when using the optparse module you can take two stances: (1) "optparse does the right thing and I don't need to debug it or write tests for it" or (2) "I'm a control freak". In the second case you can subclass the OptionParser and override it's error() and exit() methods to conform to your conventions.

When web pages check web-broswer compatibility

Saturday, January 27, 2007

I don't trust banks and definately not online banking since their log in procedures are laugable, at least in the UK (no complaints about my Belgian bank, although not all Belgian banks are good either). But somehow I managed to get an account which only allows transactions to my own accounts so the possible damage is rather limited. It also limits it's usefulness but hey, it's safer.

So I could happily use it to check my balance and move money between my accounts until a few weeks ago. They suddenly decided to show me a screen that tells me that my web browser is unsupported. Since I use Epiphany usually I decide to just try it with a more common browser and surprise, using Firfox running on Ubuntu Edgy it works. Next I use the Iceweasel from my Debian Etch but no, they don't like that either.

Then for the really dumb stuff. I go to about:config in Iceweasel, find the general.useragent.firefox.extra key and change it from "Iceweasel/2.0.0.1" to "Firefox/2.0.0" and sure enough, it works. What a joke.

The only thing left was make it work with my Epiphany, so go to about:config and find general.useragent.epiphany.extra and change it to "Firefox/2.0.0". Doesn't help. Add the key general.useragent.firefox.extra and set it to "Firefox/2.0.0" did work perfectly fine though.

Someone go and beat that silly webmaster with a stick please.

It made me think though. Why can't webbrowsers not just say "I support HTML 4.01 and XHTML 1.0" or so? Then all a website needs to do is say "I need HTML 4.01 or ...". But I guess it's too easy to create web pages and to easy to make broken web browsers. And to top it off no one seems to have the attitude to just ignore people that can't read standards, instead they try and understand their garbage anyway. I know that's probably the only reason the Web managed to become what it is now, but still annoying.

Subscribe to: Posts (Atom)