Celery - minimize memory consumption

牧云@^-^@ 提交于 2021-02-04 12:18:06

问题


We have ~300 celeryd processes running under Ubuntu 10.4 64-bit , in idle every process takes ~19mb RES, ~174mb VIRT, thus - it's around 6GB of RAM in idle for all processes. In active state - process takes up to 100mb of RES and ~300mb VIRT

Every process uses minidom(xml files are < 500kb, simple structure) and urllib.

Quetions is - how can we decrease RAM consuption - at least for idle workers, probably some celery or python options may help? How to determine which part takes most of memory?

UPD: thats flight search agents, one worker for one agency/date. We have 10 agencies, one user search == 9 dates, thus we have 10*9 agents per one user search.

Is it possible start celeryd processes on demand to avoid idle workers(something like MaxSpareServers on apache)?

UPD2: Agent lifecycle is - send HTTP request, wait for response ~10-20 sec, parse xml( takes less then 0.02s), save result to MySQL


回答1:


Read this:

http://docs.celeryproject.org/en/latest/userguide/workers.html#concurrency

It sounds like you have one worker per celeryd. That seems wrong. You should have dozens of workers per celeryd. Keep raising the number of workers (and lowering the number of celeryd's) until your system is very busy and very slow.




回答2:


S. Lott is right. The main instance consumes messages and delegates them to worker pool processes. There is probably no point in running 300 pool processes on a single machine! Try 4 or 5 multiplied by the number of CPU cores. You may gain something by running more than on celeryd with a few processes each, some people have, but you would have to experiment for your application.

See http://celeryq.org/docs/userguide/workers.html#concurrency

For the upcoming 2.2 release we're working on Eventlet pool support, this may be a good alternative for IO-bound tasks, that will enable you to run 1000+ threads with minimal memory overhead, but it's still experimental and bugs are being fixed for the final release.

See http://groups.google.com/group/celery-users/browse_thread/thread/94fbeccd790e6c04

The upcoming 2.2 release also have support for autoscale, which adds/removes process on demand. See the Changelog: http://ask.github.com/celery/changelog.html#version-2-2-0 (this changelog is not completly written yet)




回答3:


The natural number of workers is close to the number of cores you have. The workers are there so that cpu-intensive tasks can use an entire core efficiently. The broker is there so that requests that don't have a worker on hand to process them are kept queued. The number of queues can be high, but that doesn't mean you need a high number of brokers either. A single broker should suffice, or you could shard queues to one broker per machine if it later turns out fast worker-queue interaction is beneficial.

Your problem seems unrelated to that. I'm guessing that your agencies don't provide a message queue api, and you have to keep around lots of requests. If so, you need a few (emphasis on not many) evented processes, for example twisted or node.js based.




回答4:


Use autoscaling. This allows the number of workers under each celeryd instance to be increased or descreased as needed. http://docs.celeryproject.org/en/latest/userguide/workers.html#autoscaling



来源:https://stackoverflow.com/questions/4346318/celery-minimize-memory-consumption

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!