What is the difference between chain and chain.from_iterable in itertools?

前端 未结 6 1305
庸人自扰
庸人自扰 2020-12-07 20:05

I could not find any valid example on the internet where I can see the difference between them and why to choose one over the other.

相关标签:
6条回答
  • 2020-12-07 20:22

    Another way to see it:

    chain(iterable1, iterable2, iterable3, ...) is for when you already know what iterables you have, so you can write them as these comma-separated arguments.

    chain.from_iterable(iterable) is for when your iterables (like iterable1, iterable2, iterable3) are obtained from another iterable.

    0 讨论(0)
  • 2020-12-07 20:23

    Another way to look at it is to use chain.from_iterable

    when you have an iterable of iterables like a nested iterable(or a compound iterbale) and use chain for simple iterables

    0 讨论(0)
  • 2020-12-07 20:25

    The first takes 0 or more arguments, each an iterable, the second one takes one argument which is expected to produce the iterables:

    from itertools import chain
    
    chain(list1, list2, list3)
    
    iterables = [list1, list2, list3]
    chain.from_iterable(iterables)
    

    but iterables can be any iterator that yields the iterables:

    def gen_iterables():
        for i in range(10):
            yield range(i)
    
    itertools.chain.from_iterable(gen_iterables())
    

    Using the second form is usually a case of convenience, but because it loops over the input iterables lazily, it is also the only way you can chain an infinite number of finite iterators:

    def gen_iterables():
        while True:
            for i in range(5, 10):
                yield range(i)
    
    chain.from_iterable(gen_iterables())
    

    The above example will give you a iterable that yields a cyclic pattern of numbers that will never stop, but will never consume more memory than what a single range() call requires.

    0 讨论(0)
  • 2020-12-07 20:27

    Extending @martijn-pieters answer

    Although the access to the inner items in the iterable remains the same, and implementation wise,

    • itertools_chain_from_iterable (i.e. chain.from_iterable in Python) and
    • chain_new (i.e. chain in Python)

    in the CPython implementation, are both duck-types of chain_new_internal


    Are there any optimization benefits from using chain.from_iterable(x), where x is an iterable of iterable; and the main purpose is to ultimately consume the flatten list of items?

    We can try benchmarking it with:

    import random
    from itertools import chain
    from functools import wraps
    from time import time
    
    from tqdm import tqdm
    
    def timing(f):
        @wraps(f)
        def wrap(*args, **kw):
            ts = time()
            result = f(*args, **kw)
            te = time()
            print('func:%r args:[%r, %r] took: %2.4f sec' % (f.__name__, args, kw, te-ts))
            return result
        return wrap
    
    def generate_nm(m, n):
        # Creates m generators of m integers between range 0 to n.
        yield iter(random.sample(range(n), n) for _ in range(m))
        
    
    def chain_star(x):
        # Stores an iterable that will unpack and flatten the list of list.
        chain_x = chain(*x)
        # Consumes the items in the flatten iterable.
        for i in chain_x:
            pass
    
    def chain_from_iterable(x):
        # Stores an iterable that will unpack and flatten the list of list.
        chain_x = chain.from_iterable(x)
        # Consumes the items in the flatten iterable.
        for i in chain_x:
            pass
    
    
    @timing
    def versus(f, n, m):
      f(generate_nm(n, m))
    

    P/S: Benchmark running... Waiting for the results.


    Results

    chain_star, m=1000, n=1000

    for _ in range(10):
        versus(chain_star, 1000, 1000)
    

    [out]:

    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6494 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6603 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6367 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6350 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6296 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6399 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6341 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6381 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6343 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6309 sec
    

    chain_from_iterable, m=1000, n=1000

    for _ in range(10):
        versus(chain_from_iterable, 1000, 1000)
    

    [out]:

    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6416 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6315 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6535 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6334 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6327 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6471 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6426 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6287 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6353 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6297 sec
    

    chain_star, m=10000, n=1000

    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2659 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2966 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2953 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.3141 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2802 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2799 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2848 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.3299 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2730 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.3052 sec
    

    chain_from_iterable, m=10000, n=1000

    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.3129 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.3064 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.3071 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2660 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2837 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2877 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2756 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2939 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2715 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2877 sec
    

    chain_star, m=100000, n=1000

    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.7874 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.3744 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.5584 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.3745 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.7982 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.4054 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.6769 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.6476 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.7397 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.8980 sec
    

    chain_from_iterable, m=100000, n=1000

    for _ in range(10):
        versus(chain_from_iterable, 100000, 1000)
    

    [out]:

    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7227 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7717 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7159 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7569 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7906 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.6211 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7294 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.8260 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.8356 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.9738 sec
    

    chain_star, m=500000, n=1000

    for _ in range(3):
        versus(chain_from_iterable, 500000, 1000)
    

    [out]:

    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 500000, 1000), {}] took: 314.5671 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 500000, 1000), {}] took: 313.9270 sec
    func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 500000, 1000), {}] took: 313.8992 sec
    

    chain_from_iterable, m=500000, n=1000

    for _ in range(3):
        versus(chain_from_iterable, 500000, 1000)
    

    [out]:

    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 500000, 1000), {}] took: 313.8301 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 500000, 1000), {}] took: 313.8104 sec
    func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 500000, 1000), {}] took: 313.9440 sec
    
    0 讨论(0)
  • 2020-12-07 20:28

    They do very similar things. For small number of iterables itertools.chain(*iterables) and itertools.chain.from_iterable(iterables) perform similarly.

    The key advantage of from_iterables lies in the ability to handle large (potentially infinite) number of iterables since all of them need not be available at the time of the call.

    0 讨论(0)
  • 2020-12-07 20:37

    I could not find any valid example ... where I can see the difference between them [chain and chain.from_iterable] and why to choose one over the other

    The accepted answer is thorough. For those seeking a quick application, consider flattening several lists:

    list(itertools.chain(["a", "b", "c"], ["d", "e"], ["f"]))
    # ['a', 'b', 'c', 'd', 'e', 'f']
    

    You may wish to reuse these lists later, so you make an iterable of lists:

    iterable = (["a", "b", "c"], ["d", "e"], ["f"])
    

    Attempt

    However, passing in an iterable to chain gives an unflattened result:

    list(itertools.chain(iterable))
    # [['a', 'b', 'c'], ['d', 'e'], ['f']]
    

    Why? You passed in one item (a tuple). chain needs each list separately.


    Solutions

    When possible, you can unpack an iterable:

    list(itertools.chain(*iterable))
    # ['a', 'b', 'c', 'd', 'e', 'f']
    
    list(itertools.chain(*iter(iterable)))
    # ['a', 'b', 'c', 'd', 'e', 'f']
    

    More generally, use .from_iterable (as it also works with infinite iterators):

    list(itertools.chain.from_iterable(iterable))
    # ['a', 'b', 'c', 'd', 'e', 'f']
    
    g = itertools.chain.from_iterable(itertools.cycle(iterable))
    next(g)
    # "a"
    
    0 讨论(0)
提交回复
热议问题