devork

E pur si muove

Timers for profilers

Sunday, February 18, 2007

After some oh... and ah... experiences I have decided that the only timer that will give you correct results in a portable manner (accross POSIX platforms that is) is resource.getrusage(resource.RUSAGE_SELF) and os.times(). This is rather shocking, certainly after I then went on to discover that on the only profiler doing this is the original profile module, and even then only since revision 38547 (before it defaulted to time.clock() which is not too bad --see below). So what's wrong with the other timers used:

  • time.clock() [profile]: This should do The Right Thing(tm) according to POSIX but unfortunately some systems decide to include the time of the children and won't tell you they're doing this (see the GNU manpage for clock(3)).
  • gettimeofday() [hotshot, _lsprof/cProfile]: This is the system call gettimeofday(2) and not available from within python. Problem here is that we're using multitasking systems and the OS can decide to run an other process at any time while you are profiling.

I'm sincerely hoping that someone is going to point out to me how wrong I am. If no one does that I will feel morally obliged to create some patches.

Profiling and Threading

The reason that I found this out is that I would like to profile some code that is running threads. As it stands I can not find any useful code for that. There are a few interesting bits around however that I could hack into something very ugly that might just work.

To start there is the threading.setprofile() call as well as the sys.setprofile() call (I know this may sound obvious, but I didn't know about the former before). However this doesn't help you at all as the profiler might get in and out of the thread between two of your profiler calls (which, if you where using hotshot or _lsprof/cProfile wouldn't matter as you'd have that problem anyway ;-P).

Randomly trying to find out more about threading I search for the sys.setcheckinterval() to find what exactly it does. Intead of finding a nice discription about what it does (appart from apparently switching threads) I find a crude hack for making sure that some code gets executed atomicaly (i.e. no thread switching happens during it): call sys.setcheckinterval(sys.maxint) just before it and restore it afterwards.

So my vague plan for a terrible hack is change the profile module to do exactly that just before the profiler callback returns (actually before the profiler takes the time just before it returns). Then when the profiler callback is entered (and just after this has taken the time) do a sys.setcheckinterval(-1) so that I can be certain python will switch to other threads when needed.

If this does end up working it will be terribly inefficient, but it's worth a shot I recon. At least would be possible to profile multithreaded code in some meaningful way.

Schizophrenic thoughts

Thursday, February 15, 2007

On one hand I use tab-completion on the command line all the time. I even go as far as creating my own -terribly incomplete- completion functions in bash for commands that I use often

On the other hand, when programming I don't use tab-completion -maybe mostly because it's not so easy to get in my preferred editor. But in this case my mind is going to argue that if you need tab completion to know your variables you're screwed anyway and should redesign. Although, somewhere my mind seems to acknowledge that tab-completion for an API would be useful. Indeed, I always have an ipython session nearby to play with the API and read docstrings.

Maybe I should take the time once to sort out my emacs so it can do all of that too.

Confusing

Roundup praise

Wednesday, February 14, 2007

Roundup is just an amazing bug tracking system. It really is way more general then that, their own words for it are not bad: an issue tracking system for knowledge workers.

Why I love it:

  • It's written in python
  • It has a poweful and flexible concept/abstraction of a database very well suited for it's purpose.
  • It has a very flexible page rendering and templating system. With clever URL crafting you can modify how and what the page displays.
  • It's configuration is done by creating python objects (modifying the database layout) and writing python functions (automating and policing changes to the issues).

And lots lots more...

Obviously nothing is perfect and it still has a few minor bugs and annoyances. But this is just so much nicer and more flexible then say bugzilla.

Writing applications as modules

Wednesday, February 14, 2007

The Problem

I recently had to write a few command line applications of the form "command [options] args" that did some stuff, maybe printed a few things on screen and exited with a certain exit code. Nothing weird here.

These apps where part of a larger server system however and needed to use some of the modules from these servers for some of their work (in the name of code reuse obviously). A little later these apps would look nicer when they are separated out into their own modules as well (all hail code reuse again the apps can share code) and now it is really a short step to wanting to use some of the more general modules of the apps in the server.

I'm not sure that last step was very important, I think it all started when the app was split up in modules. But the last one made it very obvious: you can't just print random stuff to the user and decide to sys.exit() the thing anywhere you want. You want the code to behave like real modules: throw exceptions and not print anything on the terminal. That's not all, you also want to write unit tests for every bit of code too. Ultimately you need one main routine and you want to test that too, so even that can't exit the program.

The Solution

Executable Wrapper

The untestable code needs to remain to an absolute minimum. Code is untestable (ok, there are work arounds) when it sys.exit()s so I raise exceptions instead. I defined exceptions as such:

class Exit(Exception):
   def __init__(self, status):
       self.status = status

   def __str__(self):
       return 'Exit with status: %d' % self.status

class ExitSucess(Exit):
   def __init__(self):
       Exit.__init__(self, 0)

class ExitFailure(Exit):
   def __init__(self):
       Exit.__init__(self, 1)

This allows for a very small executable wrapper:

#!/usr/bin/env python

import sys
from mypackage.apps import myapp

try:
        myapp.main()
except myapp.Exit, e:
        sys.exit(e.status)
except Exception, e:
        sys.stderr.write('INTERNAL ERROR: ' + str(e) + '\n')
        sys.exit(1)

The last detail is having main() defined as def mypackage.myapp.main(args=sys.argv) for testability, but that's really natural.

Messages for the user

These fall broadly in two categories: (1) short warning messages and (2) printing output. The second type is easily limited to a few very simple functions that do little more then just a few print statements, help() is an obvious example. For the first there is the logging module. In our case the logging module is used almost everywhere in the server code anyway, but even if it isn't it is a convenient way to be able to silence the logging. It's default behaviour is actually rather useful for an application, all that's needed is something like:

import logging

logging.basicConfig(format='%(levelname)s: %(message)s')

The lovely thing about this that you get --verbose or --quiet almost for free.

Mixing it together

This one handles fatal problems the program detects. You could just do a logging.error(msg) followed by a raise ExitFailure. But this just doesn't look very nice, certainly not outside the main app module (mypackages.apps.myapp in this case). But a second option is to do something like

 raise MyFatalError, 'message to user'

And have inside the main() another big try...except block:

try:
        workhorse(args)
except FatalError, e:
        sys.stderr.write('ERROR: ' + str(e) + '\n')
        raise ExitFailure

Just make sure FatalError is the superclass of all your fatal exceptions and that they all have a decent __str__() method. The reason I like this is that it helps keeping fatal error messages consistent wherever you use them in the app, as all the work is done inside the __str__() methods.

One final note; when using the optparse module you can take two stances: (1) "optparse does the right thing and I don't need to debug it or write tests for it" or (2) "I'm a control freak". In the second case you can subclass the OptionParser and override it's error() and exit() methods to conform to your conventions.

Subscribe to: Posts (Atom)