Combining chains, groups and chunks with Celery

梦想的初衷 提交于 2021-02-10 14:20:58

问题


I want to use Celery for a Url grabber.

I have a list of Url, and I must do a HTTP request on every URL and write the result in a file (same file for the whole list).

My first idea was to make this code in the task which is called by Celery beat every n minutes :

@app.task
def get_urls(self):
    results = [get_url_content.si(
        url=url
    ) for url in urls]

    ch = chain(
        group(*results),
        write_result_on_disk.s()
    )

    return ch()

This code works pretty well, but there is 1 problem : I have a thousand of URL to grab, if 1 of the get_url_content fails, the write_result_on_disk is not called and we lose all the previous grabbed contents.

What I want to do is to chunk the tasks by splitting the URLs, grab their result and write it on disk. For example the contents of 20 urls are written on disk.

Do you have an idea please ? I tried the chunks() function but did not got really useful results.


回答1:


Using CeleryBeat for cron-like tasks is a good idea.

I would try to catch exceptions in your get_url_content micro-tasks. Just return something else when you catch them. This way, you can evaluate (e.g. count, list, inspect) them in a summarize_task.

How to use chunks and chain chunks with another task:

Step 1: Convert the chunk to a group:

As described in http://docs.celeryproject.org/en/latest/userguide/canvas.html#chunks, .group() transforms an object of type celery.canvas.chunks into a group, which is a much more common type in Celery.

Step 2: Chain a group and a task

The "Blow your mind by combining" section in http://docs.celeryproject.org/en/latest/userguide/canvas.html#the-primitives mentions:

Chaining a group together with another task will automatically upgrade it to be a chord


Here is some code with the two tasks and how I usually call them:

@app.task
def solve_micro_task(arg: str) -> dict:
    ...

@app.task
def summarize(items: List[List[dict]]):
    flat_list = [item for sublist in items for item in sublist]
    for report in flat_list:
        ...

chunk_tasks = solve_micro_task.chunks(<your iterator, e.g., a list>), 10)  # type: celery.canvas.chunks
summarize_task = summarize.s()
chain(chunk_tasks.group(), summarize_task)()


来源:https://stackoverflow.com/questions/45082707/combining-chains-groups-and-chunks-with-celery

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!