Cartesian product of large iterators (itertools)

前端 未结 4 2145
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-17 15:56

From a previous question I learned something interesting. If Python\'s itertools.product is fed a series of iterators, these iterators will be converted into tu

相关标签:
4条回答
  • 2020-12-17 16:41

    Here's an implementation which calls callables and iterates iterables, which are assumed restartable:

    def product(*iterables, **kwargs):
        if len(iterables) == 0:
            yield ()
        else:
            iterables = iterables * kwargs.get('repeat', 1)
            it = iterables[0]
            for item in it() if callable(it) else iter(it):
                for items in product(*iterables[1:]):
                    yield (item, ) + items
    

    Testing:

    import itertools
    g = product(lambda: itertools.permutations(xrange(100)),
                lambda: itertools.permutations(xrange(100)))
    print next(g)
    print sum(1 for _ in g)
    
    0 讨论(0)
  • 2020-12-17 16:41

    This can't be done with standard Python generators, because some of the iterables must be cycled through multiple times. You have to use some kind of datatype capable of "reiteration." I've created a simple "reiterable" class and a non-recursive product algorithm. product should have more error-checking, but this is at least a first approach. The simple reiterable class...

    class PermutationsReiterable(object):
        def __init__(self, value):
            self.value = value
        def __iter__(self):
            return itertools.permutations(xrange(self.value))
    

    And product iteslf...

    def product(*reiterables, **kwargs):
        if not reiterables:
            yield ()
            return
        reiterables *= kwargs.get('repeat', 1)
    
        iterables = [iter(ri) for ri in reiterables]
        try:
            states = [next(it) for it in iterables]
        except StopIteration:
            # outer product of zero-length iterable is empty
            return
        yield tuple(states)
    
        current_index = max_index = len(iterables) - 1
        while True:
            try:
                next_item = next(iterables[current_index])
            except StopIteration:
                if current_index > 0:
                    new_iter = iter(reiterables[current_index])
                    next_item = next(new_iter)
                    states[current_index] = next_item
                    iterables[current_index] = new_iter
                    current_index -= 1
                else:
                    # last iterable has run out; terminate generator
                    return
            else:
                states[current_index] = next_item
                current_index = max_index
                yield tuple(states)
    

    Tested:

    >>> pi2 = PermutationsReiterable(2)
    >>> list(pi2); list(pi2)
    [(0, 1), (1, 0)]
    [(0, 1), (1, 0)]
    >>> list(product(pi2, repeat=2))
    [((0, 1), (0, 1)), ((0, 1), (1, 0)), ((1, 0), (0, 1)), ((1, 0), (1, 0))]
    >>> giant_product = product(PermutationsReiterable(100), repeat=5)
    >>> len(list(itertools.islice(giant_product, 0, 5)))
    5
    >>> big_product = product(PermutationsReiterable(10), repeat=2)
    >>> list(itertools.islice(big_product, 0, 5))
    [((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)), 
     ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 9, 8)), 
     ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 7, 9)), 
     ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 9, 7)), 
     ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 9, 7, 8))]
    
    0 讨论(0)
  • 2020-12-17 16:41

    I'm sorry to up this topic but after spending hours debugging a program trying to iterate over recursively generated cartesian product of generators. I can tell you that none of the solutions above work if not working with constant numbers as in all the examples above.

    Correction :

    from itertools import tee
    
    def product(*iterables, **kwargs):
        if len(iterables) == 0:
            yield ()
        else:
            iterables = iterables * kwargs.get('repeat', 1)
            it = iterables[0]
            for item in it() if callable(it) else iter(it):
                iterables_tee = list(map(tee, iterables[1:]))
                iterables[1:] = [it1 for it1, it2 in iterables_tee]
                iterable_copy = [it2 for it1, it2 in iterables_tee]
                for items in product(*iterable_copy):
                    yield (item, ) + items
    

    If your generators contain generators, you need to pass a copy to the recursive call.

    0 讨论(0)
  • 2020-12-17 16:45

    Without "iterator recreation", it may be possible for the first of the factors. But that would save only 1/n space (where n is the number of factors) and add confusion.

    So the answer is iterator recreation. A client of the function would have to ensure that the creation of the iterators is pure (no side-effects). Like

    def iterProduct(ic):
        if not ic:
            yield []
            return
    
        for i in ic[0]():
            for js in iterProduct(ic[1:]):
                yield [i] + js
    
    
    # Test
    x3 = lambda: xrange(3)
    for i in iterProduct([x3,x3,x3]):
        print i
    
    0 讨论(0)
提交回复
热议问题