I've used processing for Python. It mimicks the API of the threading module and is thus quite easy to use.
If you happen to use map/imap or a generator/list comprehension, converting your code to use processing is straightforward:
def do_something(x):
return x**(x*x)
results = [do_something(n) for n in range(10000)]
can be parallelized with
import processing
pool = processing.Pool(processing.cpuCount())
results = pool.map(do_something, range(10000))
which will use however many processors you have to calculate the results. There are also lazy (Pool.imap) and asynchronous variants (Pool.map_async).
There is a queue class which implements Queue.Queue, and workers that are similar to threads.
Gotchas
processing is based on fork(), which has to be emulated on Windows. Objects are transferred via pickle/unpickle, so you have to make sure that this works. Forking a process that has acquired resources already might not be what you want (think database connections), but in general it works. It works so well that it has been added to Python 2.6 on the fast track (cf. PEP-317).