Optimising python code by AST fiddling

Obviously no real optimisation, just optimisation exactly like python -O [-O] would create. But in "normal" byte compiled files, i.e. in .pyc files instead of .pyo files.

Why would I want to do that? Well, we really don't feel like shipping code with assert statements or if __debug__: ... bits in. They are only useful during development and should not appear in shipped code. And while we're at it stripping the docstrings can't hurt either. Still no reason not to just use .pyo files though, but there are if the code also needs to run as a windows service. The Python for Windows Extensions provide an excellent framework for making your python code behave like a service, unfortunately it does not seem to support optimised code. And I only found an old (but interesting) email thread discussing this, but other then that no one seems to talk about these issues. So I started thinking if we could just modify our code during the build to strip out all the things we don't want in it, we effectively have .pyo code inside a .pyc. Try it for yourself:

$ echo pass > test.py
$ python -m py_compile test.py
$ python -OO -m py_compile test.py
$ cmp test.pyc test.pyo && echo equal || echo unequal
equal

It appears to me that modifying the code would be as sane as trying to get some of the things to work suggested in that thread, and in my eyes it seems cleaner for the moment. Python 2.5 comes with a parser module that allows you to parse python source code into Abstract Syntax Trees (ASTs), once you have the AST objects you can convert them to lists and tuples and then convert them back to an AST (here you get the opportunity to change the list form of the AST). Lastly it provides functions to compile these ASTs into a code object just like the builtin compile() function does. From there it is not far to creating a .pyc file, the py_compile module shows us that with the help of imp and marshal modules this is only a few lines of code.

Writing all of this down and looking at the py_compile code made me realise that the same might actually be achieved by simply renaming the .pyo files to .pyc files! Or there could be something in the marshal module that behaves differently when running optimised. I'll have to try that out.

All of this, however, raises the point mentioned in that python-dev thread linked above, why bother telling how optimised the compiled code is in the file extension? Guido's idea of storing which optimisation has been done in the .pyc is not bad - although I personally don't like the automatic writing of .pyc files, but that's another discussion. I'm not sure if the extra bytecode that Brett and Philip propose later is that great though, personally I'd get rid of -O completely and just run the .pyc if it's the same age as the .py and if it's not don't bother re-creating it - modules are mostly compiled on installation anyway (except for developers).

So, am I going insane? Or is there really no need for the behaviour with -O and .pyo? If the optimisations can be prefomed by some AST transformations anyway, then I think the (mostly annoying) -O behaviour is obsolete.

(And not writing .pyc files by default is an other story, in the mean time the .pyc files can be re-created using the same optimisations as the old one was created with. Or with the "current settings" of the python interpreter (which would mean none as I want to drop -O), I don't think that's too important right now.)

Update

Obviously the magic number in .pyc files differs from the one in .pyo files, so just renaming the files won't work. I should have know...