I am using Celery standalone (not within Django). I am planning to have one worker task type running on multiple physical machines. The task does the following
Perhaps, celery.concurrency.gevent could provide the pool sharing and not aggravate the GIL. However, it's support is still "experimental".
And a psycopg2.pool.SimpleConnectionPool to share amongst greenlets (coroutines) which will all run in a single process/thread.
Tiny bit of other stack discussion on the topic.