devork

E pur si muove

Timers for profilers

Sunday, February 18, 2007

After some oh... and ah... experiences I have decided that the only timer that will give you correct results in a portable manner (accross POSIX platforms that is) is resource.getrusage(resource.RUSAGE_SELF) and os.times(). This is rather shocking, certainly after I then went on to discover that on the only profiler doing this is the original profile module, and even then only since revision 38547 (before it defaulted to time.clock() which is not too bad --see below). So what's wrong with the other timers used:

  • time.clock() [profile]: This should do The Right Thing(tm) according to POSIX but unfortunately some systems decide to include the time of the children and won't tell you they're doing this (see the GNU manpage for clock(3)).
  • gettimeofday() [hotshot, _lsprof/cProfile]: This is the system call gettimeofday(2) and not available from within python. Problem here is that we're using multitasking systems and the OS can decide to run an other process at any time while you are profiling.

I'm sincerely hoping that someone is going to point out to me how wrong I am. If no one does that I will feel morally obliged to create some patches.

Profiling and Threading

The reason that I found this out is that I would like to profile some code that is running threads. As it stands I can not find any useful code for that. There are a few interesting bits around however that I could hack into something very ugly that might just work.

To start there is the threading.setprofile() call as well as the sys.setprofile() call (I know this may sound obvious, but I didn't know about the former before). However this doesn't help you at all as the profiler might get in and out of the thread between two of your profiler calls (which, if you where using hotshot or _lsprof/cProfile wouldn't matter as you'd have that problem anyway ;-P).

Randomly trying to find out more about threading I search for the sys.setcheckinterval() to find what exactly it does. Intead of finding a nice discription about what it does (appart from apparently switching threads) I find a crude hack for making sure that some code gets executed atomicaly (i.e. no thread switching happens during it): call sys.setcheckinterval(sys.maxint) just before it and restore it afterwards.

So my vague plan for a terrible hack is change the profile module to do exactly that just before the profiler callback returns (actually before the profiler takes the time just before it returns). Then when the profiler callback is entered (and just after this has taken the time) do a sys.setcheckinterval(-1) so that I can be certain python will switch to other threads when needed.

If this does end up working it will be terribly inefficient, but it's worth a shot I recon. At least would be possible to profile multithreaded code in some meaningful way.

Sunday, February 18, 2007 | Labels: |

1 comments:

Jean-Paul Calderone said...

Unfortunately there seem to be some SMP problems with both getrusage() and times() (where they'll report CPU usage going /backwards/, urgg). Another possibility is clock_gettime(3), using CLOCK_PROCESS_CPUTIME_ID or CLOCK_PROCESS_THREAD_CPUTIME_ID. These are present on Linux, but I'm not sure about their availability elsewhere. Alas, they aren't exposed by Python, so I've ended up using dl or ctypes when I want to call them.

New comments are not allowed.

Subscribe to: Post Comments (Atom)