问题
I have a long running celery task which iterates over an array of items and performs some actions.
The task should somehow report back which item is it currently processing so end-user is aware of the task's progress.
At the moment my django app and celery seat together on one server, so I am able to use Django's models to report the status, but I am planning to add more workers which are away from Django, so they can't reach DB.
Right now I see few solutions:
- Store intermediate results manually using some storage, like redis or mongodb making then available over the network. This worries me a little bit because if for example I will use redis then I should keep in sync the code on a Django side reading the status and Celery task writing the status, so they use the same keys.
- Report status to the Django back from celery using REST calls. Like
PUT http://django.com/api/task/123/items_processed
- Maybe use Celery event system and create events like
Item processed
on which django updates the counter - Create a seperate worker which runs on a server with django which holds a task which only increases
items proceeded
count, so when the task is done with an item it issuesincrease_messages_proceeded_count.delay(task_id)
.
Are there any solution or hidden problems with the ones I mentioned?
回答1:
There are probably many ways to achieve your goal, but here is how I would do it.
Inside your long running celery task set the progress using django's caching framework:
from django.core.cache import cache
@app.task()
def long_running_task(self, *args, **kwargs):
key = "my_task: %s" % self.result.id
...
# do whatever you need to do and set the progress
# using cache:
cache.set(key, progress, timeout="whatever works for you")
...
Then all you have to do is make a recurring AJAX GET request with that key and retrieve the progress from cache. Something along those lines:
def task_progress_view(request, *args, **kwargs):
key = request.GET.get('task_key')
progress = cache.get(key)
return HttpResponse(content=json.dumps({'progress': progress}),
content_type="application/json; charset=utf-8")
Here is a caveat though, if you are running your server as multiple processes, make sure that you are using something like memcached, because django's native caching will be inconsistent among the processes. Also I probably wouldn't use celery's task_id
as a key, but it is sufficient for demonstration purpose.
回答2:
Simplest:
Your tasks and django app already share access one or two data stores - the broker and the results backend (if you're using one that is different to the broker)
You can simply put some data into one or other of these data stores that indicates which item the task is currently processing.
e.g. if using redis simply have a key 'task-currently-processing' and store the data relevant to the item currenlty being processed in there.
回答3:
Take a look at flower
- a real-time monitor and web admin for Celery distributed task queue:
- https://github.com/mher/flower#api
- http://flower.readthedocs.org/en/latest/api.html#get--api-tasks
You need it for presentation, right? Flower
works with websockets.
For instance - receive task completion events in real-time (taken from official docs):
var ws = new WebSocket('ws://localhost:5555/api/task/events/task-succeeded/');
ws.onmessage = function (event) {
console.log(event.data);
}
You would likely need to work with tasks ('ws://localhost:5555/api/tasks/').
I hope this helps.
回答4:
You can use something like Swampdragon to reach the user from the Celery instance (you have to be able to reach it from the client thou, take care not to run afoul of CORS thou). It can be latched onto the counter, not the model itself.
来源:https://stackoverflow.com/questions/32583897/receiving-events-from-celery-task