Asyncio worker that handles N jobs at a time?

ε祈祈猫儿з 提交于 2019-12-24 01:19:55

问题


I'm trying to make an asyncio worker class that will consume jobs from a job queue and process up to N jobs in parallel. Some jobs may queue additional jobs. When the job queue is empty and the worker finishes all of its current jobs, it should end.

I'm still struggling with asyncio conceptually. Here is one of my attempts, where N=3:

import asyncio, logging, random

async def do_work(id_):
    await asyncio.sleep(random.random())
    return id_

class JobQueue:
    ''' Maintains a list of all pendings jobs. '''
    def __init__(self):
        self._queue = asyncio.Queue()
        self._max_id = 10
        for id_ in range(self._max_id):
            self._queue.put_nowait(id_ + 1)

    def add_job(self):
        self._max_id += 1
        self._queue.put_nowait(self._max_id)

    async def get_job(self):
        return await self._queue.get()

    def has_jobs(self):
        return self._queue.qsize() > 0

class JobWorker:
    ''' Processes up to 3 jobs at a time in parallel. '''
    def __init__(self, job_queue):
        self._current_jobs = set()
        self._job_queue = job_queue
        self._semaphore = asyncio.Semaphore(3)

    async def run(self):
        while self._job_queue.has_jobs() or len(self._current_jobs) > 0:
            print('Acquiring semaphore...')
            await self._semaphore.acquire()
            print('Getting a job...')
            job_id = await self._job_queue.get_job()
            print('Scheduling job {}'.format(job_id))
            self._current_jobs.add(job_id)
            task = asyncio.Task(do_work(job_id))
            task.add_done_callback(self.task_finished)

    def task_finished(self, task):
        job_id = task.result()
        print('Finished job {} / released semaphore'.format(job_id))
        self._current_jobs.remove(job_id)
        self._semaphore.release()
        if random.random() < 0.2:
            print('Queuing a new job')
            self._job_queue.add_job()

loop = asyncio.get_event_loop()
jw = JobWorker(JobQueue())
print('Starting event loop')
loop.run_until_complete(jw.run())
print('Event loop ended')
loop.close()

An excerpt of the output:

Starting event loop
Acquiring semaphore...
Getting a job...
Scheduling job 1
Acquiring semaphore...
Getting a job...
Scheduling job 2
Acquiring semaphore...
Getting a job...
Scheduling job 3
Acquiring semaphore...
Finished job 2 / released semaphore
Getting a job...
Scheduling job 4
...snip...
Acquiring semaphore...
Finished job 11 / released semaphore
Getting a job...
Finished job 12 / released semaphore
Finished job 13 / released semaphore

It appears to correctly process all jobs while processing no more than 3 jobs at any one time. However, the program hangs after the last job is finished. As indicated by the output, it appears to be hanging at job_id = await self._job_queue.get_job(). Once the job queue is empty, this coroutine will never resume, and the check to see if the job queue is empty (at the top of the loop) isn't reached again.

I've tried working around this in a number of ways but conceptually something just don't quite fit. My current WIP is passing some futures between the queue and the worker and then using some combination of asyncio.wait(...) on all of them, but it's getting ugly and I'm wondering if there is an elegant solution that I'm overlooking.


回答1:


You could take advantage of queue.task_done that indicates that a formerly enqueued task is complete. Then you can combine queue.join and queue.get using asyncio.wait: if queue.join finishes and queue.get doesn't, this means all the jobs have been completed.

See this example:

class Worker:

    def __init__(self, func, n=3):
        self.func = func
        self.queue = asyncio.Queue()
        self.semaphore = asyncio.Semaphore(n)

    def put(self, *args):
        self.queue.put_nowait(args)

    async def run(self):
        while True:
            args = await self._get()
            if args is None:
                return
            asyncio.ensure_future(self._target(args))

    async def _get(self):
        get_task = asyncio.ensure_future(self.queue.get())
        join_task = asyncio.ensure_future(self.queue.join())
        await asyncio.wait([get_task, join_task], return_when='FIRST_COMPLETED')
        if get_task.done():
            return task.result()

    async def _target(self, args):
        try:
            async with self.semaphore:
                return await self.func(*args)
        finally:
            self.queue.task_done()



回答2:


You can timeout get_job with simple asyncio.wait_for. For example with 1s, and get back to the beginning of loop on timeout.

    async def run(self):
        while self._job_queue.has_jobs() or len(self._current_jobs) > 0:
            print('Acquiring semaphore...')
            await self._semaphore.acquire()
            print('Getting a job...')
            try:
                job_id = await asyncio.wait_for(self._job_queue.get_job(), 1)
            except asyncio.TimeoutError:
                continue
            print('Scheduling job {}'.format(job_id))
            self._current_jobs.add(job_id)
            task = asyncio.Task(do_work(job_id))
            task.add_done_callback(self.task_finished)


来源:https://stackoverflow.com/questions/38114528/asyncio-worker-that-handles-n-jobs-at-a-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!