devork

E pur si muove

No feedback?

Friday, August 26, 2005

In the beginning of this week I emailed my mentor to know what he thought of my project, if he was happy with the state etc. I do know from his blog that he's fairly busy these days, but still would enjoy some feedback. Also just in case my email got lost somewhere I'm sure he'll read this. So this was basically just a ping to him, sorry to have bothered everyone else!

One FIXME down...

Tuesday, August 23, 2005

Profile.calibrate() now uses the correct timer instead of just the default timer (time.time()). In fact it still uses this timer when the user did not specify a specific timer (and why really would anyone want this? Appart form making me work one or two weeks more in the last two months.), since both _hotshot and time.time() use gettimeofday() for this. Only when the user requested a timer this timer is used now. Which is fair enough.

The last FIXME is the one where the line events from _hotshot are not used when reading the log file in hstats. Since this functionality is not needed in hpstats to emulate pstats I skipped it. And I still plan on skipping it (in favour of that damn uni work). I still removed the FIXME though. I agree that's quite cheecky, but I made a line event raise a NotImplementedError and documented this as such in the docstring. Hope that covers it.

Magic

Monday, August 22, 2005

As promised I worked on a proper installble package of hprof. That went quite smooth, I had some trouble figuring out how to create a .pth file with distutils, but after some googling I found some setup keyword that worked. Not that is it documented in the official documentation. The result is now that some magic makes that profile and pstats are used when available, but when not hprofile and hpstats will be loaded as profile and pstats. Also cared for is that the _hotshot supplied with hprof is used when using hprofile so that the user supplied timer works.

Since I made a release you can grab it. Unfortunately not at Savannah since I am currently parted from my GPG key for a while.

So my current TODO list:

  • Make a proper patch for _hotshot.c and try to see if upstream want's it. It would be nice and make life easier.
  • Look if it is doable to get rid of these last two FIXME's
  • Maybe make a Debian package for the module, I'll need to think about this a bit frist though.
  • Maybe something else I forgot here. I'm still a bit anxious that I didn't find some bovious large thing I missed... ;-)
  • Do the uni work that has a deadline for 2 September. :-(

  • Small update

    Monday, August 22, 2005

    This weeked I've been helping at home again building the patio outside. It's quite impressive to see things changing from a big pit of dirt in the morning to a nice surfaced area in the evening. If I had a digital camera I'd show you nice before and after pictures, however your imagination will have to do.

    Today I did create a small script that shows you how much fater hstats is compared to hotshot.stats. It simply automates what I've bragged about earlier this summer. I lost a litle bit of speed (last time I bragged it didn't collect the callers of every function, but that was needed for pstats emulation) but not significantly. Intrestingly enough it seems that the speed gain is smaller on the 500MHz Pentium III coppermine based laptop than on the 450MHz Ultra Sparc IIi Sarbre Sun box. But it's still significant, on the last tests it was about 30% gain on the Sparc and 25% gain on the i386. Also curious is the Intel outperforming the Sun in absolute figures, 228s to 96s and 157s to 72s. But I don't dare to generalise anything here!

    Tomorrow I'll look at making a proper installable package from my work. A setup.py exists already, but I only used the build target to test the _hotshot module. A MANIFEST.in file still needs to be created for examle as well as some other administrativa files.

    That's it for tonight!

    (Almost) Done!

    Thursday, August 18, 2005

    Just completed hpstats module, it's all in CVS. This means that theoretically the project is finished! Altho I expect to code another few days on it before the end of August, nothing can be perfect... leave alone the first time round!

    Along the list of TODO's is making a little script that shows that my stats analysis is faster then the old hotshot one. Other then that I'll have to go and dig up the email in which my mentor mentions what the requirements where for the project. But I'm sure I'll have a hand full of things to do. Also I guess I'll figure out tomorrow that I forgot something obvious, that's the way things go.

    But for now some ephoria. Some random statistics: I wrote 112 unit tests, all of which pass. A grep FIXME *.c *.py only retruns me tow hits, one of which is non-essential, the other also not really ;-). SLOCCount tells me I wrote about 1833 lines of Python, did this in about 4.53 months and the total estimated cost to develop is apparently $51,046. And this does not include my modifications to _hotshot.c since I did not write a significant amount of the source there.

    That makes me feel good :-)

    Sleep is good

    Thursday, August 18, 2005

    Last night I got some problems with a function call. As far as I could see it was right but it's unit test kept failing. After a while I called it a day and went to sleep. Wake up this morning and have a look. Only to discover it was a trivial error in the test! Aaarch.

    And for today hopefully the last stretch of road to complete. Let's get started!

    Nice progress

    Wednesday, August 17, 2005

    Work has been progressing nicely the last few days. All the hacking has been going smoothly, which does not mean bug free - far from it! But I seem to have finally reached a good cycle of writing unit tests, implementing the features and refining that all. The scaffolding is in place, the design proven to be good enough (I did have to go and modify other bits that I wrote earlier this summer to achieve new functionality), it's almost a joy to work!

    Last night I thought I had printing finished, a bit premature as I discovered this morning. I completely forgot about the filtering of the output. But that's corrected now. Also the modifications needed in hstats for the support of hpstats.Stats.print_callers(); hpstats.Stats.print_callees() is in place so I can now go and knock that down. This should be finished today so I can work on the multiple profiles support tomorrow. Future looks bright (for the moment at least)!

    Ambiguous documentation

    Tuesday, August 16, 2005

    pstats does not behave as the documentation says it does. I got to the stage that you can look through the output of Lib/test/test_profile.py (in Python distribution) and the output of that very same file but with a import hprofile as profile at the top instead of the import profile. My output is good, very good I'd even dare to say. However it is not sorted in the same way!

    The output is suppsed to be sorted by "stdname". The description of this output in the documentation is as follows:

    The subtle distinction between 'nfl' and 'stdname' is that the standard name is a sort of the name as printed, which means that the embedded line numbers get compared in an odd way. For example, lines 3, 20, and 40 would (if the file names were the same) appear in the string order 20, 3 and 40. In contrast, 'nfl' does a numeric compare of the line numbers. In fact, sort_stats('nfl') is the same as sort_stats('name', 'file', 'line').
    But, it appears they actually sort the data with the criteria in a different order as explained above! They seem to sort on 'name', 'line', 'file' when using "stdname".

    Notice however how they also say that "the standard name is a sort of the name as printed". And this would make more sense, that is what they actually do. Don't know why they explain it as the same as "nfl" though. Got me confused for a while (I admit, untill halfway this post! Why it can be usefull to blog about your problems!).

    On another note, work is progressing nicely. I only need to implement a couple more Stats methods before I'm done. They are a bit harder again though:

  • The .print_callers() and .print_callees() methods. I'll need to add data into the hstats module before I can do this. But hopefully that shouldn't become to difficult.
  • Support for loading more then one profiling file. The hard part is not the merging of the data, the problem is that there is also a .dump_stats() method which can save all the data. Since I can not join two hotshot files (not withouth considerable hacking in _hotshot and I try to keep the delta on that file as small as possible, besides, I'm running out of time, need to do uni work next week) I am currently thinking of just pickeling the data. Then to load I can just try one of the formats (the pickle or the hotshot file) and if it fails try the second.
  • After that there are just bits and bobs to do left and right. Like writing some quick comparison script that looks at my speed increase etc and generally making sure I meet all requirements. ;-)

    Clarifying the hstats module

    Monday, August 15, 2005

    Some confusion seems to exist about the stats modules I'm writing. About a month and a bit ago I wrote hstats as a module to analyse hotshot profiling data. It was never my intention to make it compatible with pstats at all. It aimed at being usable to read profiling statistics from hotshot in an efficient way.

    Currently I am working on the hpstats module, which will be api compatible with pstats. This module does use hstats so it only needs to handle higher level stuff. As an effect of this I'm now sometimes putting in new functionality into hstats but I hope this is a good thing.

    So to say the essense again: hstats is not API compatible with pstats and is not meant to be.

    Why one should have a separate /boot partition: lessons learned

    Monday, August 15, 2005
    Background

    The box (SPARC box to make matters more intresting) has two disks /dev/hda and /dev/hdb, they each have the same partion table appart from some free space at the end since two disks are never of the same size. All partitions are of type "RAID autodetec" and run indeed a RAID1 mirror. Several partitions exist:

  • /dev/md0 -> /
  • /dev/md1 -> swap
  • /dev/md2 -> /home
  • /dev/md3 -> /var
  • /dev/md4 -> /home
  • As disks are still disks we started having serious trouble with /dev/hda on the /var partition. Lots of errors, it started with I/O errors on /dev/hda that resulted in a breaking mirror. After investigation with SMART monitoring tools it seemed we first had 40, then 200-something unrecoverable errors. Trying to force SMART to repair these errors failed miserably, not sure why, maybe the spare sectors where all used (altho only 8 are reported to be used!). Anyway the disk is not in a healty condition.

    Being cautious we decide to play it safe, / had enough space so we move /var there. Next we try to build a new filesystem with a bad block scan on /dev/md3. No way, Linux software RAID just doesn't like this and fails the /dev/hda partition. At that point we decide we'll leave /var just on the root filesystem for now untill we get round to buying a new hard drive.

    Updating /etc/fstab and stopping the /dev/md3 RAID device, also zeroing the superblocs so mdadm doesn't try to assemble it at boot time is the next step. Now a simple reboot so we can be 100% sure everything is still fine.

    The problem

    After rebooting we only get SI from the SILO boot loader. WTF?? Boot from a Debian installation CD. Boot: rescue root=/dev/hda0 No luck. Whatever we try, no rescue boot works. So we get out the disk, attach it to another box, and run silo -r on it. Put it back and everything is fine.

    What happened

    When copying /var (a large chunk) to the root partition the filesytem driver or SMART will have decided that it is more efficient to move some of the already existing files. So it very funnily moved our /boot/second.b file around. That wasted about 4 hours of my time.

    So the lesson learn is to always make a separate partition for /boot so problems like this don't occur. Now we only need to fiddle with the usable sectors of /dev/md3 to make a smaller partition with no errors on and move /boot there. But that won't be for today!

    Cross platform developing

    Sunday, August 14, 2005

    Even when using Python this is easier said then done. One should be very carefull as details lies in tiny details. Wich I discovered to my shame.

    Normally I work on a desktop, this box listenst to the name of Ultra 10, so it is a UltraSparc architecture as developed by Sun Microsystems. Sometimes however, like the last few days, I do use a Compaq Armada M700, which is a Intel i386 architecture. So today I started to work again on the U10, looking where I am I run the unit test suite test_hpstats.py. Normally I'll have one test failing, this then is the feature I was working on. Two tests failed however.

    So why did that test I've forgotten about two days ago sudenly fail? After jotting in some print statements it became apparent. In the test I was comparing two values in a dictionarry to be equal. The value compared I extracted as follows: d.values()[-1][-1]. Very fine, but in one case d.values() looked like [[10, 11, 12], [0, 1, 2]] and in the second it was [[0, 1, 2]]. However as it seemed, when I changed from architecture the order in wich d.values() is sorted in is changed!

    Cheeky. One should be very carefull not to run into traps like this. With reflection I say, "Of course, dictionaries are not sorted!" But the error is made so easily!

    Cleaning up?

    Thursday, August 11, 2005

    So last time I said I wasn't happy with hstats. Maybe that was a bit impulsive, but still valid. Kent Johnson kindly pointed out for me that was the wrong attitude. Maybe he's right (read: yeah, of course he's right), but it is still painfull for me to follow his advice.

    But again, I can't decide. However I have some text file somewhere (under control of an application I don't really like, gjots2) which does have some ideas about what should maybe change in hstats. This should stop me from forgetting what I was thinking that needed doing. It is only so hard to create The Right Design(tm). That was my motivation to postpone it, I hope to get a better idea of what it should look like while I'm using it. However now I'm confronted with Kent's statement, which I fear it true:

    Fix it now. There's never time to go back and clean it up later.

    The argument that nothing depends on it yet weights too of course. I guess I'll have to get my mind round it and get on with fixing it. Only I fear of getting the same situation again next month and think it's a bad solution again.

    Or maybe all this ranting of mine is because I'm having a bad day. The work on hpstats hasn't really been moving along the last few days due to various reasons, that doens't help either. Guess I'll see tomorrow what I'll do.

    I wrote crap

    Wednesday, August 10, 2005

    hstats is crap. And I only wrote it a month ago...

    The good part of that observation is that it shows that I'm learning. ;-)

    So why really? Basically just it's desing. It has one single monolithic class. And within this class everything just happens like one would wirte precedural or functial old style C. Quite useless really.

    So for building the pstats wrapper hpstats, I'll use it anyway. I guess it's more important to move on and use it. Later on I can go back, and improve hstats. Which in turn will break hpstats. That's what writing software is for you.

    Refactor mercilessly.

    Never ending quest

    Wednesday, August 10, 2005

    On my never ending quest to find the best testing strategies I stumbled across this article. It is not too long and well worth reading. Explains some stuff about using mock objects and related testing strategies. All of this is of course XP influenced.

    hotshot accepts user timer function

    Friday, August 05, 2005

    After a long day it all works. It's checked in to CVS in the "hotshot_timer-branch". The way it works is that the object returned by _hotshot.profiler(filename) now has an additional method settimer(timer). The timer is any callable object that returns a number or a sequence of numbers who's sum is the time as per the library reference text. It has more overhead then the buildin timer, certainly when a sequence is returned. But that's the price you pay for flexibility.

    When used form the hprofile module (wich is the replacement for profile) this is not a problem however since that is what calibrate() is for. The overhead is still supposed to be reasonably fixed.

    Once more the writing of unit tests did help to fix a lot of initial bugs. Nice.

    Just some tweaking and testing is left to do with hprofile and it's tests now. Then the wrapper will be (hopefully) complete. So next week I should, with a bit of luck, be able to start on the pstats wrapper wich would, if all is ok, also be the last part of my project to finish. Appart from cleaning up of course...all the fixme's etc need to disappear!

    Pointer to funcion

    Thursday, August 04, 2005

    Great fun these things. Certainly when the pointer to the function is stored inside a struc. The result is that my version of _hotshot now uses a void pointer in ProfilerObject as a function to get the time difference since last time it was called. For now it only goes and runs the old (buildin) timer function but it opens up the possibility to set the timer function by a user.

    The setting of the timer function is sheduled for tomorrow, I guess it will be another day until another timer function can be used in reality but it doesn't look bad. This is after all the last bit that will finish off the profiler module wrapper. So I have the feeling I'm making (good) progress.

    Also to mention is that I figured out how to use branches with CVS. I decided not to add the above things in HEAD as it could have gone very wrong and things could (and can) change drastically. I also wasn't too sure about the change in package layout with this stuff added. Although I do have the impression I got it the first time right (the package layout that is).

    Programming can be simple

    Thursday, August 04, 2005

    It really is clear that it has been over 10 months since I last wrote C. As I mentioned in my last post I got segmentation faults when I changed the version string in _hotshot.c. Today I started by figuring out that the version string gets indeed used at import time. That wasn't too hard. Then I looked at my modification. But wait...I'm trying to strcat() on a string that isn't terminated yet! Move the strcat() one line down and joy all around.

    And that while I thought I'd need a debugger...

    Double Ouch

    Tuesday, August 02, 2005
    One

    To get a user supplied timer function in hotshot I need to start hacking on _hotshot.c. Being cautious I pull the file from python CVS, put it in my package tree, write a setup script, build the stuff before modifying everything -just to test build environment- and finally run all tests with the fresh compiled _hotshot.so.

    So far so good. Yesterday I scribbled down roughly what changes I need to do to _hotshot.c for my needs. Still being cautious I decided to start with something trivial. Modifying the version number so I recognise profiles made by my modified _hotshot module. A simple unit test for this was written a few minutes later so I could get along. There is a function char * get_version_string(void); that parses the CVS revision keyword. I simply modify the function to concatenate the string "-hprof" to the string returned by that function. Merely one line modified and one added.

    So lets try it then! Build goes fine. Then running the test. Segmentation fault. eh???? Really? Python segfaults at module import time! Let's make this clear, before even the get_version_string() function gets called!

    Ouch

    Pretty unproductive day. Coz that's where I'm stuck. I can think of what I want. I'm absolutely clueless about what how and when. Guess tomorrow(*) I'll have to dig into gdb.

    Two

    I started writing this blog text a few hours ago on my laptop. Then I got interupted by dinner and then I had to go and play driving instructor for my sister. Thing was I was supposed to be able to have some free time at the place we where going to as my sister had some things to do there but I didn't, so I suspended the laptop (with this blog text half finished). Suspend normally works. But me saying normally in the last sentence says it all. It never woke up again. It did partially, then got a blank screen.

    Ouch

    Nothing would help. Once I got over the fact that I'd lose it all and I resetted the box I started trying to find what went wrong. Maybe it was the fact that it normally wakes up at the lid open event but I also pressed the sleep button? No. After trying out all combinations I still don't know. It works perfectly in every situation I can think off. But it didn't when I needed it to work. Guess that's life for me.

    (*) Tomorrow I'll actually be helping with the building of the house due to logistic reasons. So it will really be the day after.

    Seconds & nanoseconds

    Friday, July 29, 2005

    Last night I struggled a lot with the calibration. In the end I fainted and gave up, went to bed and read a crap book for way too long -books are no good for me, even if they're crap I can't stop reading them. Today, after waking up way to late, I looked again at the calibration issue. Simple, after half an hour it all made sense and an hour later it all worked. Now I even finished the unit tests, they even all work and it's into CVS.

    In the process I realised that the output given currently is in the nanoseconds of hotshot instead of in the seconds that profile uses. The bias returned by calibrate uses seconds now, but next is the modification of the print_stats() method to modify everything to seconds.

    For this I'll need to subclass hstats.Stats which is good since the next thing I need to do is support the use of the bias too. It's so nice when it all fits together...

    Now just hope I don't screw up like I did yesterday with the calibration stuff!

    The easy bits

    Wednesday, July 27, 2005

    So I've written the easy methods and functions of the profile module wrapper hprofile today. Including their unittests.

    Tomorrow starts the harder work, the Profile.calibrate code. Maybe I shouldn't be too optimistic about how far I'll get with it. For what it's worth, here my current ideas of how to do this:

  • calibrate() does roughly the same as the old one. Retruns the bias, i.e. delay of function call overhead.
  • Whenever the bias is set save this in the profile file as "extra info"
  • When reading the stats, modify the values by this bias
  • The biggest question is at the last point. Should I subclass the hstats.Stats class to do this or make it a feature in the class itself? The question is really if the bias will be significant or not when using hotshot. If it is I should probably move this calibration stuff to the hstats and hotshot modules. But I'm sort of trying to postpone changes to hotshot as much as possible.

    I don't really want to change too much to hotshot since I'm not sure how good it is to change the standard library and, most importantly, if the changes will be accepted in the stdlib. It will be a bit annoying to have to distribute a separate hotshot copy, also a maintenance pain since python doesn't use distributed version control...

    To hybernate or not to hybernate...

    Monday, July 25, 2005

    That's the question

    Old days

    For years and years I always envied my father who never had to log out of his window manager at his work, he only had to lock his screen. This meant he could leave all his terminals and text editors open at night. The next day they'd still just be there! (Later on I heard that his sysadmin asked to log out from time to time anyway, I gues that's sensible coz ther will always be memory leaks -certainly in X, but anyway.)

    Needless to say I always wanted my work to be there the next time I used it too! But at home there was no way that the PC would stay on day and night.
    Later when at university I quickly found out that a computer in the same room as where you sleep is not compatible with a computer that is switched on.

    Session Managers

    Maybe I misunderstood them. But when I first heard of them (probably xsm) I thought "Wow! This is the future, I will have all my stuff still there next time I logon!"
    Not.
    I never used them. They're rubbish. What good is it to have my xterm still at the same position and size if my shell was not in the same directory and not running my pager anymore?

    I panicked. Tossed with the idea that all these programs (bash, less, etc.) should be session aware. Luckily I didn't have the guts to look at it myself, I couldn't code in those days. Maybe I'd have wasted many weeks of my life if I could. But I was disappointed, disillusioned in modern technology.

    Hibernation

    Under a month ago I was lent an old laptop, that I'm using right now. Finally a great opportunity to try out Ubuntu I thought. Very neat, It did hybernation by default. So now the big surprise. If I hybernate I get everything back exactly like I left it! Isn't this what I've always dreamed off?

    Clean slate

    Now my experience with this is not so good as I hoped. It turns out that forcing myself to close every application makes for things not being left behind in a mess. Ok, I will open that emacs again to edit the same file, but all it's other buffers will be closed for example. Now I'm reaching the conclusion that closing as much as you can at the end of the day will result in a more efficient start up the next day. During the day I'll accumulate agian lots of windows, related and unrelated, on my 6 or so virtual desks. But closing them at the end of the day makes me forget them the next day -and that is good, they're mostely not very important anyway but would make me spend precious time on needless messing around.

    Another lesson learned in life.

    Refactor Mercilessly

    Monday, July 25, 2005

    Ok, I admit. Maybe I'm just using this as an excuse. I haven't really impressed myself with the amount of work done the last few day. But hey! This was my holiday after all.

    The result however is that I have revised the test module for the hstats module (instead of having kicked off with the profile wrapper. Not nearing a nice test module yet, but at least something. Big plus is that I read up on the standard Python test documentation. Now I (hopefully!) follow the rules properly.

    So now I'm back home I've got no excuse! Tomorrow lots of work is waiting!

    Native hoshot statistics!

    Wednesday, July 20, 2005

    Finally, after some refactoring, having cursed unittests and XP but in the end blessing them anyway. The result is there: the hstats module!

    It loads hotshot data 35% faster then hotshot.stats.load() as I reported earlier. It sorts the data, and prints it out nicely. Moreover the sorting and printing seems very fast. I did this for pystones in 0.0091 seconds! So that's one goal acomplished.

    So please go and try it out. Just drop the hstats.py file in your current directory, import hstats in the interpreter, read the docstrings and go! In fact I must beg you to try it out and report at least problems, but preferably wishlists too!. The current interface is very basic namely. On purpose (XP again). So I'm hoping to hear about a lot of functionality that is currently missing but is essential for you people to work the way you like it! And if you're really convinced of you feature you could even attach the appropriate unittests to verify your feature! That would be very kind.

    Hope to get reports coming in soon!

    Weekend

    That said I may not respond too soon. Tomorrow afternoon I'm flying to Newcastle, UK to finally see my girlfriend again (no it's no fun living in different countries). This will be a long weekend untill Monday night. I will do some work on Friday and Monday though as I'll be taking a laptop and I've discovered unison... But I don't know if I'll be having an internet connection.

    Progress...

    Tuesday, July 19, 2005

    So I tried this XP approach of first creating unittests the last couple of days. Writing the tests goes soooo slow! I was a bit surprised on how easily I could write the real functions afterwards tho. Maybe it is good. But it requires lots of tinking in advace and that's painfull and makes you wonder what you're wasting your time on since you're not actually writing code.

    Apparently I haven't thought enough yet. I'm currently struggeling with the decision if I should allow to separate the numbers of the calls and the recursive calls. Practically it means this: would anyone ever want only the total number of calls or only the recursive number of calls in the report? I'm more and more tinking of not.

    So appart from that issue, which will force me to modify quite a bit, I've done not too bad. I can sort data and everything. Am now working on printing. But I get confused with the above issue and technicalities involving it. Maybe I just need to look at it with a fresh mind. Maybe I'll work withouth the horrible changes.

    Stats unittests

    Friday, July 15, 2005

    That's what I've mostly done, been making and polishing unittests for the hstats module (or what's written of it. hstats btw, is the module that will create the statistics for hotshot, natively reading hotshot data. In the process one extra omission (not really a bug as such ;-)) got into the hstats.Stats.__init__(), code.

    Also have worked out -on paper- how I'll be sorting and showing this data. For this I'll give the true XP a go and try to start with the unittests first. I'll see how it goes, most likely I'll end up mixing the creation of unittests and real code.

    Weekend

    This weekend we're going to visit an old friend in Bonn. Not too much of coding thus, and definately no internet. I did get an old laptop from my brother a couple of weeks ago though, so I hope to do some coding anyway (if we're not working on his house all day). So with a bit of luck I'll have the sorting working by Monday.

    Project on Savannah

    Thursday, July 14, 2005
    So I got my project accepted on savannah. After some time of not being sure where it would be hosted, on savannah on it's own or on sourcefource in the Python project under the nondist/sandbox thing. Guess I'll stay with savannah now. I'ts under the project name pyprof. Not many files up there yet and definately nothing usefull yet. But hopefully that will change soon!

    Paperwork rant

    Thursday, July 14, 2005

    This is going to be a rant.

    I faxed Google's paperwork really early after we got it (27th of June IIRC). Now we've been told that if we haven't received confirmation yet we should re-submit! The problem is that in the mean time I've moved houses in Southampton, UK where I'm going to uni. Just after I moved houses I left all my stuff there -including my certification that proves I'm a full time student and that I had faxed the day before- and took the train to Belgium, where I officially live. So now I'm here in Belgium staying with my parents for the next two months and Google asks me to resend my application!

    I guess they'll have to be happy with just my student card then. Nothing else I can offer them now. It can't be really my fault that they've lost my first fax can it?

    Upgrading a P100

    Since I'm ranting I can as well continue. We have here a Pentium I 100MHz box that serves as a print server and sometimes as a thin client. Last week I upgraded it from Debian Woody to Sarge, it did have some trouble with generating the xfont caches tho. So after letting it churn away for a couple of days I intrrupted it. Removed all the non-essential xfonts from the machine (it can get it's fonts form a remote font server I decided) I gave it another go. Same story. So I looked into the font directory: about 2000 fonts. dpkg -S /path/to/font: no one owns the fonts! Ok, rm /path/to/fonts/*. Rerun upgrade. Yay! Test print server, all works. Shut box down.

    Next day I want to print something. Boot box: LI and that's it! Yay, interupting the installation will have messed up the configure scripts of lilo. No problem, make boot floppy (or make 5 of them, them unreliable old floppies) boot....or not. Floppy drive was not used in years and seems dead. (Oh, did I mention the box is that old that it doens't boot from cdrom). Right, so where do I find a other floppy dirve. Found a Sun Ultra 10 on the attic that's not being used and got the floppy drive out, installed it in the P100. Boot: rescue root=/dev/hdb1 Yay! Run lilo.

    Hurray! After 5 days of upgrading the box does again what it should do...

    First success!

    Wednesday, July 13, 2005
    Loading a profiling file dumped by hotshot goes 35% faster than with hotshot's old method! Specifically on a Sun Ultra 10 with a TI UltraSparc IIi (Sabre) processor -872BogoMips if that tells anything- I loaded the profile dump of pystones in 139.8 seconds with my new code (hstats.Stats("stones.prof")). Compare this to the hotshot.stats.load("stones.prof") which takes 214.5 seconds. Not that I can do anything yet with this loaded data tho... But it does fill a nice dictionary wich I only have to sort and show. No calculations to do anymore.

    Deep or shallow frame stack?

    Wednesday, July 13, 2005

    So I'm working on parsing the file created by hotshot. The question in the title relates to how the storing of the cumulative time has to happen. The problem is that every frame has to add it's time to the cumulative time of all the anchestors.

    Shallow frame stack

    Here it is feasable to just look up the name of every ancestor and then go and add the time to the cumulative time of every anchestor.

    Deep frame stack

    When doing the same as decribed above this will get very expensive. So I came up with an alternative. The plan is to keep a second stack with just the cumulative times. When a frame gets entered it adds the time delta (time since profiler callback was called last) to the time of it's parent and to the cumulative time of the parent in the separate stack

    Now when a frame exits it add's its time delta to it's own time and to the top of the cumulative time stack (it's own cumtime). Then it adds this value (from the top of the cumulative time stack) to the parent (second to last position on the stack) and to it's own (real) cumulative time after wich it pops the cumulative time stack.

    That's what I've just been implementing now. So the rest of today will be spent trying this out, writing unit tests for it and debugging it. Maybe I'll be able to start handling the line events too today. But that probably won't be tested till tomorrow.

    Profiling code accounted to user code!?

    Tuesday, July 12, 2005

    It appears hotshot does only take one time sample in the profiler callback. This means that all the time spent in hotshot's code is accounted to the user code!

    Mental note to self: remember this when I'm trying to increase accuracy later this summer!

    Iterative learning

    Monday, July 11, 2005
    Understanding Code

    I's amazing how little you (well, at least me!) understand the first time you read some (complex) code. Only after a few times you start to get the big picture and see how it all fits together. It's mostly a matter of understanding the underlying principles I guess. Anyway, the upshot of that is that by now I understand mostly how hotshot and profile do their work! And nothing stops me anymore from finding(!) and checking out that little detail.

    hprofile.py

    That's the name that I gave to the wrapper module for profile. I've written the basic outline of this file. That means I've got the class skeleton (full of raise NotImplementedError's) and the little functions that provide an alternative entry point to this class. No unittests yet though. I also wrote the part that will run the module as a script as well as a module. This lead me to another question: How do you write unittests for that? Any suggestions?

    When writing the skeleton of the class I had to decide wich methods to wrap. I decided (and please let me know if I'm wrong!) that wrapping all of it is not usefull (and actually impossible). Basically I think they didn't use enough leading underscores in the choice of the method names. There's no point in having all internals exposed to users is there? Only a hack like hotshot.stats.load() uses them...

    So the method's I decided to keep after looking at what's documented and what's used in other software are these:

  • __init__(self, timer=None, bias=None)
  • print_stats(self)
  • dump_stats(self, file)
  • run(self, cmd)
  • runctx(self, cmd, globals, locals)
  • runcall(self, func, *args, **kw)
  • calibrate(self, m, verbose=0)
  • Additionally I'll be keeping the bias bias attribute. This means I'm dropping quite a lot, but it looks like the most sensible way. Any feedback on this is welcome!

    Calibration

    Calibration is not the same in hotshot as in profile. In hotshot it tries to find the smallest possible time step. This gets then stored in the file, but is entirely unused after that. In profile this tries to find the delay in executing the code under the profiler to compensate for the invisable time spent between the calling of the profiler callback and the taking of the time. So I'll have to mimic the later. Guess I'll do so in the wrapper and then just save it with hotshot's .add_info() method to the file. It will then be the duty of the stats module to compensate for this.

    Changing development round?

    While devising concrete plans of how to write these wrappers I'm more and more thinking of starting with the statistics module. That module would only have to depend on hotshot. When trying to implement hprofile first it'll just never work as it will be missing crucial bits. I think it's rather inconvenient to work with something you don't know yet how it will look. Hence my urge to swap them round.

    Coding started

    Sunday, July 10, 2005
    Profile wrapper

    So the plan is to wrap hotshot so it can behave exactly as the profile module. It will be a bit tricky as some things of the old profile module don't make sense anymore in hotshot.

    Hotshot - Memory allocation

    So I have been learning a lot the last couple of days about how profilers in python work. And now I have a clue of how the _hotshot module -the C module of hotshot which does all the work- works. In the process I noticed at least one unchecked malloc() use! And that while the Python documentation actually asks to use the python heap (by using PyMem_Malloc()) as that is managed by the python memory manager and can keep track of how much memory is used.

    Missing Docs rant

    While figuring out how hotshot does it's work I stumbled across some PyFrame_* calls. Of course there is no documentation to find about this. Sigh. And that while Python is always so well documented! I also saw some PyEval_* calls which are undocumented. At least some of the PyEval_* calls are in the documentation, just not the ones used. Great.

    Software patents rejected!

    Wednesday, July 06, 2005

    Finally, the european parliament managed to see the light and rejected the software patents directive. There is still hope.

    Also last night I found the Self-certifying Filesystem. I nerver even considered using plain NFS, but this looks very promising. Maybe tonight I'll end up trying it on my LAN. That is if I don't spend all my time playing around with the 2.6 kernel on a Sun ultra10 box. Last time I tried that I mangled the filesystem from our production system, but now we got two spare unltra10's to play on. So hopefully I'll finally figure out why I never got that working before.

    hotshot

    Tuesday, July 05, 2005

    So I spent today looking at the old profile module and hotshot. Learned a lot - but not enough yet.

    I quite like the idea of Brett to try and stick with hotshot and try to improve that. The main job would then be to write a wrapper to emulate profile and pstats. But I still have to look at the patch in this bug report.

    Project for the next two months...

    Friday, July 01, 2005

    So this is my blog for the SoC (Google Summer of Code), as recommended by the email I got from the Python Software Foundation.

    For those who don't know what I'm doing. It is listed on Python's Summer of Code wiki under "Profile Replacement".

    I haven't done very much yet. Since I had to move house this week I've been offline for 4 days and when I got back I didn't have my normal box to read my email as that ended up in a house without power or network. So attempting to set up my fetchmail-procmail-exim-spamassassin chain I created a great mail loop that resulted in a big mess. Shame on me, I should've remembered that I had a .forward file on this box to the address I was trying to fetch my email from... Anyhow, I think I didn't lose any mail in the end. Not 100% sure though.

    Also I haven't heard from my mentor yet. Not sure if he's just having lots to do or if I lost the mail. I'll wait till after the weekend before I'll try to contact him. For now I'm still sorting out my tax stuff. Working my way through the W-7 form now to get an ITIN.

    Subscribe to: Posts (Atom)