Celery Worker Database Connection Pooling

前端未结

关注

 6  1026

I am using Celery standalone (not within Django). I am planning to have one worker task type running on multiple physical machines. The task does the following

相关标签:

6条回答

离开以前

2020-12-07 19:49

Have one DB connection per worker process. Since celery itself maintains a pool of worker processes, your db connections will always be equal to the number of celery workers. Flip side, sort of, it will tie up db connection pooling to celery worker process management. But that should be fine given that GIL allows only one thread at a time in a process.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南笙

2020-12-07 19:53
I like tigeronk2's idea of one connection per worker. As he says, Celery maintains its own pool of workers so there really isn't a need for a separate database connection pool. The Celery Signal docs explain how to do custom initialization when a worker is created so I added the following code to my tasks.py and it seems to work exactly like you would expect. I was even able to close the connections when the workers are shutdown:
```
from celery.signals import worker_process_init, worker_process_shutdown

db_conn = None

@worker_process_init.connect
def init_worker(**kwargs):
    global db_conn
    print('Initializing database connection for worker.')
    db_conn = db.connect(DB_CONNECT_STRING)


@worker_process_shutdown.connect
def shutdown_worker(**kwargs):
    global db_conn
    if db_conn:
        print('Closing database connectionn for worker.')
        db_conn.close()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-12-07 19:54

Contribute back my findings by implementing and monitoring.

Welcome feedback.

Reference: use pooling http://www.prschmid.com/2013/04/using-sqlalchemy-with-celery-tasks.html

Each worker process (prefork mode specified by -c k) will establish one new connection to DB without pooling or reusing. So if using pooling, the pool is seen only at each worker process level. So pool size > 1 is not useful, but reusing connection is still fine for saving connection from open & close.

If using one connection per worker process, 1 DB connection is established per worker process (prefork mode celery -A app worker -c k) at initialization phase. It saves connection from open & close repeatedly.

No matter how many worker thread (eventlet), each worker thread (celery -A app worker -P eventlet) only establish one connection to DB without pooling or reusing. So for eventlet, all worker threads (eventlets) on one celery process (celery -A app worker ...) have 1 db connection at each moment.

According to celery docs

but you need to ensure your tasks do not perform blocking calls, as this will halt all other operations in the worker until the blocking call returns.

It is probably due to the way of MYSQL DB connection is blocking calls.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-12-07 20:03

Perhaps, celery.concurrency.gevent could provide the pool sharing and not aggravate the GIL. However, it's support is still "experimental".

And a psycopg2.pool.SimpleConnectionPool to share amongst greenlets (coroutines) which will all run in a single process/thread.

Tiny bit of other stack discussion on the topic.

0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-07 20:07

Perhaps you can use pgbouncer. For celery nothing should change and the connection pooling is done outside of the processes. I have the same issue.

('perhaps' because I am not sure if there could be any side effects)

0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2020-12-07 20:08

You can override the default behavior to have threaded workers instead of a worker per process in your celery config:

CELERYD_POOL = "celery.concurrency.threads.TaskPool"

Then you can store the shared pool instance on your task instance and reference it from each threaded task invocation.

0 讨论(0)
发布评论:

提交评论
- 加载中...