how can I asynchronously map/filter an asynchronous iterable?

前端 未结 3 1893
孤街浪徒
孤街浪徒 2020-12-20 12:06

Let\'s say I have an asynchronous iterable that I can pass over using async for, how then can I then map and filter it to a new asynchronous iterator? The follo

相关标签:
3条回答
  • 2020-12-20 12:30

    You can't use yield inside coroutines. To implement your idea, only way I see is to implement Asynchronous Iterator. If I'm right, something like that:

    class MapFilter:
        def __init__(self, aiterable, p, func):
            self.aiterable = aiterable
            self.p = p
            self.func = func
    
        async def __aiter__(self):
            return self
    
        async def __anext__(self):
            while True:
                payload = await self.aiterable.__anext__()  # StopAsyncIteration would be raise here on no new values
                if self.p(payload):
                    return self.func(payload)
    

    Let's test it. Here's complete example with helper arange class (I took it from here):

    import asyncio
    
    
    class arange:
        def __init__(self, n):
            self.n = n
            self.i = 0
    
        async def __aiter__(self):
            return self
    
        async def __anext__(self):
            i = self.i
            self.i += 1
            if self.i <= self.n:
                await asyncio.sleep(0)  # insert yield point
                return i
            else:
                raise StopAsyncIteration
    
    
    class MapFilter:
        def __init__(self, aiterable, p, func):
            self.aiterable = aiterable
            self.p = p
            self.func = func
    
        async def __aiter__(self):
            return self
    
        async def __anext__(self):
            while True:
                payload = await self.aiterable.__anext__()
                if self.p(payload):
                    return self.func(payload)
    
    
    async def main():
        aiterable = arange(5)
        p = lambda x: bool(x>2)
        func = lambda x: x*2
    
        async for i in MapFilter(aiterable, p, func):
            print(i)
    
    if __name__ == "__main__":
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())
    

    Output:

    6
    8
    
    0 讨论(0)
  • 2020-12-20 12:34

    A recently published PEP draft (PEP 525), whose support is scheduled for Python 3.6, proposes to allow Asynchronous Generators with the same syntax you came up with.

    Meanwhile, you can also use the asyncio_extras library mentioned by CryingCyclops in its comment if you don't want to deal with the asynchronous iterator boilerplate.

    From the docs:

    @async_generator
    async def mygenerator(websites):
        for website in websites:
            page = await http_fetch(website)
            await yield_async(page)
    
    async def fetch_pages():
        websites = ('http://foo.bar', 'http://example.org')
        async for sanitized_page in mygenerator(websites):
            print(sanitized_page)
    

    There is also the async_generator library which supports yield from constructs.

    0 讨论(0)
  • 2020-12-20 12:36

    https://gist.github.com/dvdotsenko/d8e0068775ac04b58993f604f122284f

    asynchronous map and filter implementation for Python 3.6+, specifically designed to return subtasks out of order, whichever is done first.

    from collections import deque
    from typing import Any, Callable, Collection, AsyncIterator, Iterator, Union
    
    
    async def _next(gg):
        # repackaging non-asyncio next() as async-like anext()
        try:
            return next(gg)
        except StopIteration:
            raise StopAsyncIteration
    
    
    async def _aionext(gg):
        # there is no anext() :(
        return await gg.__anext__()
    
    
    async def map_unordered(fn:Callable, args:Union[Iterator,Collection,AsyncIterator], maxsize=None):
        """
        Async generator yielding return values of resolved invocations
        of `fn` against arg in args list
    
        Arguments are consumed and fed to callable in the order they are presented in args.
        Results are yielded NOT in order of args. Earliest done is yielded.
    
        If `size` is specified, worker tasks pool is constrained to that size.
    
        This is asyncio equivalent of Gevent's `imap_unordered(fn, args_iterable, pool_size)`
        http://www.gevent.org/api/gevent.pool.html#gevent.pool.Group.imap_unordered
    
        `args` may be Async Iterator or regular Iterator. 
         Thus, you can chain `map_unordered` as `args` for another `map_unordered`
    
        Because this is an async generator, cannot consume it as regular iterable.
        Must use `async for`.
    
        Usage example:
    
                # note NO await in this assignment
                gen = map_unordered(fn, arguments_iter, maxsize=3)
                async for returned_value in gen:
                    yield returned_value
    
        """
        if maxsize == 0:
            raise ValueError(
                'Argument `maxsize` cannot be set to zero. '
                'Use `None` to indicate no limit.'
            )
    
        # Make args list consumable like a generator
        # so repeated islice(args, size) calls against `args` move slice down the list.
    
        if hasattr(args, '__anext__'):
            n = _aionext
        elif hasattr(args, '__next__'):
            n = _next
        else:
            args = iter(args)
            n = _next
    
        have_args = True  # assumed. Don't len(args).
        pending_tasks = deque()
    
        while have_args or len(pending_tasks):
            try:
                while len(pending_tasks) != maxsize:
                    arg = await n(args)
                    pending_tasks.append(
                        asyncio.Task(fn(arg))
                    )
            except StopAsyncIteration:
                have_args = False
    
            if not len(pending_tasks):
                return
    
            done, pending_tasks = await asyncio.wait(pending_tasks, return_when=asyncio.FIRST_COMPLETED)
            pending_tasks = deque(pending_tasks)
    
            for task in done:
                yield await task  # await converts task object into its return value
    
    
    async def _filter_wrapper(fn, arg):
        return (await fn(arg)), arg
    
    async def _filter_none(arg):
        return not (arg is None)
    
    async def filter_unordered(fn:Union[Callable,None], args:Union[Iterator,Collection,AsyncIterator], maxsize=None):
        """
        Async filter generator yielding values of `args` collection that match filter condition.
        Like python's native `filter([Callable|None], iterable)` but:
        - allows iterable to be async iterator
        - allows callable to be async callable
        - returns results OUT OF ORDER - whichever passes filter test first.
    
        Arguments are consumed and fed to callable in the order they are presented in args.
        Results are yielded NOT in order of args. Earliest done and passing the filter condition is yielded.
    
        If `maxsize` is specified, worker tasks pool is constrained to that size.
    
        This is inspired by Gevent's `imap_unordered(fn, args_iterable, pool_size)`
        http://www.gevent.org/api/gevent.pool.html#gevent.pool.Group.imap_unordered
    
        Because this is an async generator, cannot consume it as regular iterable.
        Must use `async for`.
    
        Usage example:
    
                # note NO await in this assignment
                gen = filter_unordered(fn, arguments_iter, maxsize=3)
                async for returned_value in gen:
                    yield returned_value
    
        """
        if maxsize == 0:
            raise ValueError(
                'Argument `maxsize` cannot be set to zero. '
                'Use `None` to indicate no limit.'
            )
    
        if hasattr(args, '__anext__'):
            n = _aionext
        elif hasattr(args, '__next__'):
            n = _next
        else:
            args = iter(args)
            n = _next
    
        if fn is None:
            fn = _filter_none
    
        have_args = True  # assumed. Don't len(args).
        pending_tasks = deque()
    
        while have_args or len(pending_tasks):
            try:
                while len(pending_tasks) != maxsize:
                    arg = await n(args)
                    pending_tasks.append(
                        asyncio.Task(_filter_wrapper(fn,arg))
                    )
            except StopAsyncIteration:
                have_args = False
    
            if not len(pending_tasks):
                return
    
            done, pending_tasks = await asyncio.wait(pending_tasks, return_when=asyncio.FIRST_COMPLETED)
            pending_tasks = deque(pending_tasks)
    
            for task in done:
                filter_match, arg = await task
                if filter_match:
                    yield arg
    

    Works like Gevent's imap_unordered but unlike Gevent's version also allows the args iterable to be an async value generator. Means that you could chain these.

    Given:

    async def worker(seconds):
        print('> Start wait', seconds)
        await asyncio.sleep(seconds)
        print('< End wait', seconds)
        return seconds
    
    
    async def to_aio_gen(ll):
        for e in ll:
            yield e
    
    async def test_map(ll, size=None):
        t = time.time()
        async for v in map_unordered(worker, ll, maxsize=size):
            print('-- elapsed second', round(time.time() - t, 1), ' received value', v)
    
    
    ll = [
        0.2,
        0.4,
        0.8,
        1.2,
        1.1,
        0.3,
        0.6,
        0.9,
    ]
    

    Test outputs:

    non-asyncio iterable, pool size = 3

    >>> asyncio.run(test_map(ll, 3))
    > Start wait 0.2
    > Start wait 0.4
    > Start wait 0.8
    < End wait 0.2
    -- elapsed second 0.2  received value 0.2
    > Start wait 1.2
    < End wait 0.4
    -- elapsed second 0.4  received value 0.4
    > Start wait 1.1
    < End wait 0.8
    -- elapsed second 0.8  received value 0.8
    > Start wait 0.3
    < End wait 0.3
    -- elapsed second 1.1  received value 0.3
    > Start wait 0.6
    < End wait 1.2
    -- elapsed second 1.4  received value 1.2
    > Start wait 0.9
    < End wait 1.1
    -- elapsed second 1.5  received value 1.1
    < End wait 0.6
    -- elapsed second 1.7  received value 0.6
    < End wait 0.9
    -- elapsed second 2.3  received value 0.9
    

    Async Iterator as arg list, pool size = 3, filter

    async def more_than_half(v):
        await asyncio.sleep(v)
        return v > 0.5
    
    >>> asyncio.run(filter_unordered(more_than_half, aio_gen(ll), 3))
    -- elapsed second 0.8  received value 0.8
    -- elapsed second 1.4  received value 1.2
    -- elapsed second 1.5  received value 1.1
    -- elapsed second 1.7  received value 0.6
    -- elapsed second 2.3  received value 0.9
    
    0 讨论(0)
提交回复
热议问题