IPC and the GIL
As recently described excelently, threads are ugly beasts, waiting to get you when you let your guard down (and eventually you will). Yes, that means that I should really get my head round the asynchat module and stop using ThreadingMixIns with the SocketServer, but that's not the point of this post.
Inter process communication, aka IPC, is way safer and scalable when you want to distribute the work between many processors. But what bothers me is the complete lack of a nice support for this in the stdlib, while there is an excellent threading module.
So what options are there?
You could stay inside the stdlib and construct something out of pickle and sockets, but that isn't amazing as you'll have to write a lot of boiler plate code that will have it's own bugs. Leaving the stdlib you could probably replace pickle with json, but appart from being able to talk with non-python code you're not really better off.
There is of course the excellent omniORBpy if you grok CORBA. But the python mapping of CORBA was written by C++ programmers, so despite being the best CORBA mapping available it is still not the most lovely thing to work with as python developer. And CORBA is a rather heavy weight approach that is far from applicable in all situations. XML-RPC is another internet-aware IPC option that even has support in the stdlib, but once more the overhead when you're just on a local machine and want to use 2 CPUs is silly. Twisted must surely have some solution too, but I don't really know that and again I think it will be rather heavy-weight. Lastly I feel obliged to mention SOAP in this paragraph too, but can't help to shrudder when thinking about it.
I guess what I'm looking for is some simple, light weight and scalable module that does something similar to how Erlang passes messages around. Maybe all I'm asking for is a module that ties up subprocess and pickle and lives under the IPC section in the stdlib. But it would be nice if it could also transparantly stream the data accross a socket so you can also run on multiple hosts if you want (maybe in a separate module though). Having a module like that around would obsolete the need for boilerplate code as well as establishing some sort of "best practice" IPC.
Anyways, I think this is more like me wondering what people use or would want to have for their IPC in python. Anyone?
12 comments:
Adam said...
This would be really nice and it was exactly what I was thinking as I read Guido's post.
Well "gee what's the right way to do python ipc" was as far as I got.
manuel moe g said...
The more I read about Erlang, the more I like.
I am not impressed with my competence in the programming required for IPC.
That said, Pyro keeps popping up in my searches for related technology.
I am probably going to roll my own versioned backup here at work. Different servers, different workstations, maybe mail server, maybe database.
I will design it as separate processes. Roll my own "write-only" file system, based on the philosophy of GIT. Bought a 6.5TB SATA array from Dell, and the prices keep dropping.
I am pretty sure I will implement Erlang message passing (between threads, between processes) as how I will implement concurrency/connection. Need another "actor", just write another Python module, nothing needed to restart.
Pyro seems like a good place to start.
Security is not implemented in Pyro, but the logical thing to do is to implement all publicly exposed access as http/https, deal with security there, then have the http/https server use unsecured IPC on the "safe" side of the network.
Use that for all parts of the network that could be compromised. Why not reduce it to an already solved problem?
Boy, anything besides Erlang style message passing just seems fishy. Frankly, I won't expose any of Pyro's remote object capability. It seems like the "Law of Leaky Abstractions", trying to "fake" attribute access and function calls over different processes over different machines.
I would not mind hearing what part of my plan is brain damaged. Even if all of it is! ;)
Anonymous said...
dbus?
http://www.freedesktop.org/wiki/Software/dbus
Anonymous said...
Also checkout the python wrapper for the spread toolkit
http://www.spread.org/
http://www.python.org/other/spread/doc.html
Dethe Elza said...
Couldn't agree more that Python needs Erlang-like processes and message passing. I've suggested this to the PyPy folks, who seem interested.
As for simply replacing threads with processes, there is processing.py, which is modelled after threading.py:
http://mail.python.org/pipermail/python-dev/2006-October/069297.html
It appears to be up to version 0.21, so caveat emptor:
http://www.python.org/pypi/processing/0.21
Brendan Eich is planning Erlang-like concurrency for Javascript 3, I hope that Guido's recent thoughts on threads mean he is also leaning that way.
Anonymous said...
Hello,
what's wrong with the Queue.Queue module?
Seems like a reasonable way to co-ordinate a bunch of threads without shooting yourself in the foot.
Unknown said...
Hi Floris,
If you want to use asynchronous I/O and threads in a network application, maybe you should have a look at Allegra "asynchronous network peer programming" library.
It's small (51KB of compressed sources), has all you asked for and more.
A simple event loop:
async_loop
marginal improvements over asynchat:
async_chat
support for threads
select_trigger and thread_loop
and subprocess:
synchronized
Regards,
Paul said...
Start here:
http://wiki.python.org/moin/ParallelProcessing
Norbert Klamann said...
Did you look at candygram? It aims to build erlang-like Process-Handling in python.
casey said...
I use PyLinda, a Python tuplespace implementation for exactly this kind of IPC scalability.
For web serving, I use FCGI.
Unknown said...
You might be interested in the previous Kill GIL discussion on the Py3k list.
Anonymous said...
XML-RPC (via xmlrpclib) has worked for me in the past. Also Zope's ZODB with ZEO allows you to share a persistent object space between multiple processes (using ZODB does not require you to use Zope).
New comments are not allowed.