How do I iterate over a large number of tuples of integers in the order of their sum?

泪湿孤枕 提交于 2019-12-07 16:52:01

问题


I'm using itertools.combinations() to iterate over tuples of integers.

I am interested in the tuple with the lowest sum that satisfies some conditions:

def findLowestNiceTuple:
    for tup in itertools.combinations(range(1, 6), 2):
        if niceTuple(tup):
            return tup

The generator's default order is not in the order of the elements' sum. For example:

>>> itertools.combinations(range(1, 6), 2)

gives a generator which will yield the following elements:

[(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]

As you can see, the sum of (1, 5) is larger than that of (2,3). For early termination, I need the tuples in the order ..., (1, 4), (2, 3), (1, 5), ....

For a modest number of combinations, you can get around this by using sorted():

>>> sorted(itertools.combinations(range(1, 6), 2), key=sum)
[(1, 2), (1, 3), (1, 4), (2, 3), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]

However, sorted() converts the generator to a list which is kept in memory entirely. This means that it no longer scales very well. Something like itertools.combinations(range(1, 600), 400) will inevitably produce a MemoryError.

Is there a more memory-friendly way to achieve the desired result?

PS: I do realize that it would take ages to fully iterate over the last sequence I mentioned, but the tuple I am looking for should be very close to the start. And if I can count on the order, I can terminate early as in the first snippet.


回答1:


Here's how I'd solve it, with a recursive function that finds all combinations that sum to a given value:

def ordered_combinations(pop, n):
    pop = sorted(pop)

    for s in range(sum(pop[:n]), sum(pop[-n:])+1):
        yield from get_sums(pop, s, n)

def get_sums(pop, s, n):
    if n == 1:
        if s in pop:
            yield [s]
        return

    for i, v in enumerate(pop):
        if sum(pop[i:i+n]) > s:
            return
        for rest in get_sums(pop[i+1:], s-v, n-1):
            rest.append(v)
            yield rest

Here's an example of it's output:

>>> for c in ordered_combinations(range(1, 8), 4):
    print(c, sum(c))


[4, 3, 2, 1] 10
[5, 3, 2, 1] 11
[6, 3, 2, 1] 12
[5, 4, 2, 1] 12
[7, 3, 2, 1] 13
[6, 4, 2, 1] 13
[5, 4, 3, 1] 13
[7, 4, 2, 1] 14
[6, 5, 2, 1] 14
[6, 4, 3, 1] 14
[5, 4, 3, 2] 14
[7, 5, 2, 1] 15
[7, 4, 3, 1] 15
[6, 5, 3, 1] 15
[6, 4, 3, 2] 15
[7, 6, 2, 1] 16
[7, 5, 3, 1] 16
[6, 5, 4, 1] 16
[7, 4, 3, 2] 16
[6, 5, 3, 2] 16
[7, 6, 3, 1] 17
[7, 5, 4, 1] 17
[7, 5, 3, 2] 17
[6, 5, 4, 2] 17
[7, 6, 4, 1] 18
[7, 6, 3, 2] 18
[7, 5, 4, 2] 18
[6, 5, 4, 3] 18
[7, 6, 5, 1] 19
[7, 6, 4, 2] 19
[7, 5, 4, 3] 19
[7, 6, 5, 2] 20
[7, 6, 4, 3] 20
[7, 6, 5, 3] 21
[7, 6, 5, 4] 22

The combinations are always yielded with the biggest values first, as an artifact of how I'm building them as lists (by appending small values on the end, rather than by concatenating to the front). If you want them ordered from smallest to largest, you can change the rest.append(v); yield rest lines to yield [v]+rest.

The code uses the yield from syntax that was introduced with Python 3.3. If you're using an earlier version that doesn't support that, you can use this equivalent code:

for v in get_sums(pop, s, n):
    yield v

The code can even handle the extreme case you described of 400-combinations taken from an 800 member range. Here's the first twenty results of that computation (shown only with their largest 10 values, since the rest are all identically 390 down to 1), and their sums:

>>> for i, v in enumerate(ordered_combinations(range(1, 800), 400)):
    if i >= 20:
        break
    print(v[:10], sum(v))


[400, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80200
[401, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80201
[402, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80202
[401, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80202
[403, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80203
[402, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80203
[401, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80203
[404, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[403, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[402, 401, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[402, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80204
[401, 400, 399, 398, 396, 395, 394, 393, 392, 391] 80204
[405, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[404, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[403, 401, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[403, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80205
[402, 401, 399, 397, 396, 395, 394, 393, 392, 391] 80205
[402, 400, 399, 398, 396, 395, 394, 393, 392, 391] 80205
[401, 400, 399, 398, 397, 395, 394, 393, 392, 391] 80205
[406, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80206

Because it's recursive, this code may fail if you request an 1000-combination (this is due to Python's default recursion limit). You can modify the limit it with sys.setrecursionlimit if necessary.

It may also have memory issues if you go exceedingly deep with an extremely large population, since get_sums slices (and so copies) the population in the recursive step. If your use for this code will only be using ranges, you can probably fix the memory issue by removing the pop = sorted(pop) line from ordered_combinations, since Python 3's range objects can be sliced efficiently (that is, range(1,100)[10:] is range(11,100)).



来源:https://stackoverflow.com/questions/14864867/how-do-i-iterate-over-a-large-number-of-tuples-of-integers-in-the-order-of-their

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!