devork

E pur si muove

Seconds & nanoseconds

Friday, July 29, 2005

Last night I struggled a lot with the calibration. In the end I fainted and gave up, went to bed and read a crap book for way too long -books are no good for me, even if they're crap I can't stop reading them. Today, after waking up way to late, I looked again at the calibration issue. Simple, after half an hour it all made sense and an hour later it all worked. Now I even finished the unit tests, they even all work and it's into CVS.

In the process I realised that the output given currently is in the nanoseconds of hotshot instead of in the seconds that profile uses. The bias returned by calibrate uses seconds now, but next is the modification of the print_stats() method to modify everything to seconds.

For this I'll need to subclass hstats.Stats which is good since the next thing I need to do is support the use of the bias too. It's so nice when it all fits together...

Now just hope I don't screw up like I did yesterday with the calibration stuff!

The easy bits

Wednesday, July 27, 2005

So I've written the easy methods and functions of the profile module wrapper hprofile today. Including their unittests.

Tomorrow starts the harder work, the Profile.calibrate code. Maybe I shouldn't be too optimistic about how far I'll get with it. For what it's worth, here my current ideas of how to do this:

  • calibrate() does roughly the same as the old one. Retruns the bias, i.e. delay of function call overhead.
  • Whenever the bias is set save this in the profile file as "extra info"
  • When reading the stats, modify the values by this bias
  • The biggest question is at the last point. Should I subclass the hstats.Stats class to do this or make it a feature in the class itself? The question is really if the bias will be significant or not when using hotshot. If it is I should probably move this calibration stuff to the hstats and hotshot modules. But I'm sort of trying to postpone changes to hotshot as much as possible.

    I don't really want to change too much to hotshot since I'm not sure how good it is to change the standard library and, most importantly, if the changes will be accepted in the stdlib. It will be a bit annoying to have to distribute a separate hotshot copy, also a maintenance pain since python doesn't use distributed version control...

    To hybernate or not to hybernate...

    Monday, July 25, 2005

    That's the question

    Old days

    For years and years I always envied my father who never had to log out of his window manager at his work, he only had to lock his screen. This meant he could leave all his terminals and text editors open at night. The next day they'd still just be there! (Later on I heard that his sysadmin asked to log out from time to time anyway, I gues that's sensible coz ther will always be memory leaks -certainly in X, but anyway.)

    Needless to say I always wanted my work to be there the next time I used it too! But at home there was no way that the PC would stay on day and night.
    Later when at university I quickly found out that a computer in the same room as where you sleep is not compatible with a computer that is switched on.

    Session Managers

    Maybe I misunderstood them. But when I first heard of them (probably xsm) I thought "Wow! This is the future, I will have all my stuff still there next time I logon!"
    Not.
    I never used them. They're rubbish. What good is it to have my xterm still at the same position and size if my shell was not in the same directory and not running my pager anymore?

    I panicked. Tossed with the idea that all these programs (bash, less, etc.) should be session aware. Luckily I didn't have the guts to look at it myself, I couldn't code in those days. Maybe I'd have wasted many weeks of my life if I could. But I was disappointed, disillusioned in modern technology.

    Hibernation

    Under a month ago I was lent an old laptop, that I'm using right now. Finally a great opportunity to try out Ubuntu I thought. Very neat, It did hybernation by default. So now the big surprise. If I hybernate I get everything back exactly like I left it! Isn't this what I've always dreamed off?

    Clean slate

    Now my experience with this is not so good as I hoped. It turns out that forcing myself to close every application makes for things not being left behind in a mess. Ok, I will open that emacs again to edit the same file, but all it's other buffers will be closed for example. Now I'm reaching the conclusion that closing as much as you can at the end of the day will result in a more efficient start up the next day. During the day I'll accumulate agian lots of windows, related and unrelated, on my 6 or so virtual desks. But closing them at the end of the day makes me forget them the next day -and that is good, they're mostely not very important anyway but would make me spend precious time on needless messing around.

    Another lesson learned in life.

    Refactor Mercilessly

    Monday, July 25, 2005

    Ok, I admit. Maybe I'm just using this as an excuse. I haven't really impressed myself with the amount of work done the last few day. But hey! This was my holiday after all.

    The result however is that I have revised the test module for the hstats module (instead of having kicked off with the profile wrapper. Not nearing a nice test module yet, but at least something. Big plus is that I read up on the standard Python test documentation. Now I (hopefully!) follow the rules properly.

    So now I'm back home I've got no excuse! Tomorrow lots of work is waiting!

    Native hoshot statistics!

    Wednesday, July 20, 2005

    Finally, after some refactoring, having cursed unittests and XP but in the end blessing them anyway. The result is there: the hstats module!

    It loads hotshot data 35% faster then hotshot.stats.load() as I reported earlier. It sorts the data, and prints it out nicely. Moreover the sorting and printing seems very fast. I did this for pystones in 0.0091 seconds! So that's one goal acomplished.

    So please go and try it out. Just drop the hstats.py file in your current directory, import hstats in the interpreter, read the docstrings and go! In fact I must beg you to try it out and report at least problems, but preferably wishlists too!. The current interface is very basic namely. On purpose (XP again). So I'm hoping to hear about a lot of functionality that is currently missing but is essential for you people to work the way you like it! And if you're really convinced of you feature you could even attach the appropriate unittests to verify your feature! That would be very kind.

    Hope to get reports coming in soon!

    Weekend

    That said I may not respond too soon. Tomorrow afternoon I'm flying to Newcastle, UK to finally see my girlfriend again (no it's no fun living in different countries). This will be a long weekend untill Monday night. I will do some work on Friday and Monday though as I'll be taking a laptop and I've discovered unison... But I don't know if I'll be having an internet connection.

    Progress...

    Tuesday, July 19, 2005

    So I tried this XP approach of first creating unittests the last couple of days. Writing the tests goes soooo slow! I was a bit surprised on how easily I could write the real functions afterwards tho. Maybe it is good. But it requires lots of tinking in advace and that's painfull and makes you wonder what you're wasting your time on since you're not actually writing code.

    Apparently I haven't thought enough yet. I'm currently struggeling with the decision if I should allow to separate the numbers of the calls and the recursive calls. Practically it means this: would anyone ever want only the total number of calls or only the recursive number of calls in the report? I'm more and more tinking of not.

    So appart from that issue, which will force me to modify quite a bit, I've done not too bad. I can sort data and everything. Am now working on printing. But I get confused with the above issue and technicalities involving it. Maybe I just need to look at it with a fresh mind. Maybe I'll work withouth the horrible changes.

    Stats unittests

    Friday, July 15, 2005

    That's what I've mostly done, been making and polishing unittests for the hstats module (or what's written of it. hstats btw, is the module that will create the statistics for hotshot, natively reading hotshot data. In the process one extra omission (not really a bug as such ;-)) got into the hstats.Stats.__init__(), code.

    Also have worked out -on paper- how I'll be sorting and showing this data. For this I'll give the true XP a go and try to start with the unittests first. I'll see how it goes, most likely I'll end up mixing the creation of unittests and real code.

    Weekend

    This weekend we're going to visit an old friend in Bonn. Not too much of coding thus, and definately no internet. I did get an old laptop from my brother a couple of weeks ago though, so I hope to do some coding anyway (if we're not working on his house all day). So with a bit of luck I'll have the sorting working by Monday.

    Project on Savannah

    Thursday, July 14, 2005
    So I got my project accepted on savannah. After some time of not being sure where it would be hosted, on savannah on it's own or on sourcefource in the Python project under the nondist/sandbox thing. Guess I'll stay with savannah now. I'ts under the project name pyprof. Not many files up there yet and definately nothing usefull yet. But hopefully that will change soon!

    Paperwork rant

    Thursday, July 14, 2005

    This is going to be a rant.

    I faxed Google's paperwork really early after we got it (27th of June IIRC). Now we've been told that if we haven't received confirmation yet we should re-submit! The problem is that in the mean time I've moved houses in Southampton, UK where I'm going to uni. Just after I moved houses I left all my stuff there -including my certification that proves I'm a full time student and that I had faxed the day before- and took the train to Belgium, where I officially live. So now I'm here in Belgium staying with my parents for the next two months and Google asks me to resend my application!

    I guess they'll have to be happy with just my student card then. Nothing else I can offer them now. It can't be really my fault that they've lost my first fax can it?

    Upgrading a P100

    Since I'm ranting I can as well continue. We have here a Pentium I 100MHz box that serves as a print server and sometimes as a thin client. Last week I upgraded it from Debian Woody to Sarge, it did have some trouble with generating the xfont caches tho. So after letting it churn away for a couple of days I intrrupted it. Removed all the non-essential xfonts from the machine (it can get it's fonts form a remote font server I decided) I gave it another go. Same story. So I looked into the font directory: about 2000 fonts. dpkg -S /path/to/font: no one owns the fonts! Ok, rm /path/to/fonts/*. Rerun upgrade. Yay! Test print server, all works. Shut box down.

    Next day I want to print something. Boot box: LI and that's it! Yay, interupting the installation will have messed up the configure scripts of lilo. No problem, make boot floppy (or make 5 of them, them unreliable old floppies) boot....or not. Floppy drive was not used in years and seems dead. (Oh, did I mention the box is that old that it doens't boot from cdrom). Right, so where do I find a other floppy dirve. Found a Sun Ultra 10 on the attic that's not being used and got the floppy drive out, installed it in the P100. Boot: rescue root=/dev/hdb1 Yay! Run lilo.

    Hurray! After 5 days of upgrading the box does again what it should do...

    First success!

    Wednesday, July 13, 2005
    Loading a profiling file dumped by hotshot goes 35% faster than with hotshot's old method! Specifically on a Sun Ultra 10 with a TI UltraSparc IIi (Sabre) processor -872BogoMips if that tells anything- I loaded the profile dump of pystones in 139.8 seconds with my new code (hstats.Stats("stones.prof")). Compare this to the hotshot.stats.load("stones.prof") which takes 214.5 seconds. Not that I can do anything yet with this loaded data tho... But it does fill a nice dictionary wich I only have to sort and show. No calculations to do anymore.

    Deep or shallow frame stack?

    Wednesday, July 13, 2005

    So I'm working on parsing the file created by hotshot. The question in the title relates to how the storing of the cumulative time has to happen. The problem is that every frame has to add it's time to the cumulative time of all the anchestors.

    Shallow frame stack

    Here it is feasable to just look up the name of every ancestor and then go and add the time to the cumulative time of every anchestor.

    Deep frame stack

    When doing the same as decribed above this will get very expensive. So I came up with an alternative. The plan is to keep a second stack with just the cumulative times. When a frame gets entered it adds the time delta (time since profiler callback was called last) to the time of it's parent and to the cumulative time of the parent in the separate stack

    Now when a frame exits it add's its time delta to it's own time and to the top of the cumulative time stack (it's own cumtime). Then it adds this value (from the top of the cumulative time stack) to the parent (second to last position on the stack) and to it's own (real) cumulative time after wich it pops the cumulative time stack.

    That's what I've just been implementing now. So the rest of today will be spent trying this out, writing unit tests for it and debugging it. Maybe I'll be able to start handling the line events too today. But that probably won't be tested till tomorrow.

    Profiling code accounted to user code!?

    Tuesday, July 12, 2005

    It appears hotshot does only take one time sample in the profiler callback. This means that all the time spent in hotshot's code is accounted to the user code!

    Mental note to self: remember this when I'm trying to increase accuracy later this summer!

    Iterative learning

    Monday, July 11, 2005
    Understanding Code

    I's amazing how little you (well, at least me!) understand the first time you read some (complex) code. Only after a few times you start to get the big picture and see how it all fits together. It's mostly a matter of understanding the underlying principles I guess. Anyway, the upshot of that is that by now I understand mostly how hotshot and profile do their work! And nothing stops me anymore from finding(!) and checking out that little detail.

    hprofile.py

    That's the name that I gave to the wrapper module for profile. I've written the basic outline of this file. That means I've got the class skeleton (full of raise NotImplementedError's) and the little functions that provide an alternative entry point to this class. No unittests yet though. I also wrote the part that will run the module as a script as well as a module. This lead me to another question: How do you write unittests for that? Any suggestions?

    When writing the skeleton of the class I had to decide wich methods to wrap. I decided (and please let me know if I'm wrong!) that wrapping all of it is not usefull (and actually impossible). Basically I think they didn't use enough leading underscores in the choice of the method names. There's no point in having all internals exposed to users is there? Only a hack like hotshot.stats.load() uses them...

    So the method's I decided to keep after looking at what's documented and what's used in other software are these:

  • __init__(self, timer=None, bias=None)
  • print_stats(self)
  • dump_stats(self, file)
  • run(self, cmd)
  • runctx(self, cmd, globals, locals)
  • runcall(self, func, *args, **kw)
  • calibrate(self, m, verbose=0)
  • Additionally I'll be keeping the bias bias attribute. This means I'm dropping quite a lot, but it looks like the most sensible way. Any feedback on this is welcome!

    Calibration

    Calibration is not the same in hotshot as in profile. In hotshot it tries to find the smallest possible time step. This gets then stored in the file, but is entirely unused after that. In profile this tries to find the delay in executing the code under the profiler to compensate for the invisable time spent between the calling of the profiler callback and the taking of the time. So I'll have to mimic the later. Guess I'll do so in the wrapper and then just save it with hotshot's .add_info() method to the file. It will then be the duty of the stats module to compensate for this.

    Changing development round?

    While devising concrete plans of how to write these wrappers I'm more and more thinking of starting with the statistics module. That module would only have to depend on hotshot. When trying to implement hprofile first it'll just never work as it will be missing crucial bits. I think it's rather inconvenient to work with something you don't know yet how it will look. Hence my urge to swap them round.

    Coding started

    Sunday, July 10, 2005
    Profile wrapper

    So the plan is to wrap hotshot so it can behave exactly as the profile module. It will be a bit tricky as some things of the old profile module don't make sense anymore in hotshot.

    Hotshot - Memory allocation

    So I have been learning a lot the last couple of days about how profilers in python work. And now I have a clue of how the _hotshot module -the C module of hotshot which does all the work- works. In the process I noticed at least one unchecked malloc() use! And that while the Python documentation actually asks to use the python heap (by using PyMem_Malloc()) as that is managed by the python memory manager and can keep track of how much memory is used.

    Missing Docs rant

    While figuring out how hotshot does it's work I stumbled across some PyFrame_* calls. Of course there is no documentation to find about this. Sigh. And that while Python is always so well documented! I also saw some PyEval_* calls which are undocumented. At least some of the PyEval_* calls are in the documentation, just not the ones used. Great.

    Software patents rejected!

    Wednesday, July 06, 2005

    Finally, the european parliament managed to see the light and rejected the software patents directive. There is still hope.

    Also last night I found the Self-certifying Filesystem. I nerver even considered using plain NFS, but this looks very promising. Maybe tonight I'll end up trying it on my LAN. That is if I don't spend all my time playing around with the 2.6 kernel on a Sun ultra10 box. Last time I tried that I mangled the filesystem from our production system, but now we got two spare unltra10's to play on. So hopefully I'll finally figure out why I never got that working before.

    hotshot

    Tuesday, July 05, 2005

    So I spent today looking at the old profile module and hotshot. Learned a lot - but not enough yet.

    I quite like the idea of Brett to try and stick with hotshot and try to improve that. The main job would then be to write a wrapper to emulate profile and pstats. But I still have to look at the patch in this bug report.

    Project for the next two months...

    Friday, July 01, 2005

    So this is my blog for the SoC (Google Summer of Code), as recommended by the email I got from the Python Software Foundation.

    For those who don't know what I'm doing. It is listed on Python's Summer of Code wiki under "Profile Replacement".

    I haven't done very much yet. Since I had to move house this week I've been offline for 4 days and when I got back I didn't have my normal box to read my email as that ended up in a house without power or network. So attempting to set up my fetchmail-procmail-exim-spamassassin chain I created a great mail loop that resulted in a big mess. Shame on me, I should've remembered that I had a .forward file on this box to the address I was trying to fetch my email from... Anyhow, I think I didn't lose any mail in the end. Not 100% sure though.

    Also I haven't heard from my mentor yet. Not sure if he's just having lots to do or if I lost the mail. I'll wait till after the weekend before I'll try to contact him. For now I'm still sorting out my tax stuff. Working my way through the W-7 form now to get an ITIN.

    Subscribe to: Posts (Atom)