问题
I'm using itertools.combinations() to iterate over tuples of integers.
I am interested in the tuple with the lowest sum that satisfies some conditions:
def findLowestNiceTuple:
for tup in itertools.combinations(range(1, 6), 2):
if niceTuple(tup):
return tup
The generator's default order is not in the order of the elements' sum. For example:
>>> itertools.combinations(range(1, 6), 2)
gives a generator which will yield the following elements:
[(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]
As you can see, the sum of (1, 5) is larger than that of (2,3). For early termination, I need the tuples in the order ..., (1, 4), (2, 3), (1, 5), ...
.
For a modest number of combinations, you can get around this by using sorted()
:
>>> sorted(itertools.combinations(range(1, 6), 2), key=sum)
[(1, 2), (1, 3), (1, 4), (2, 3), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]
However, sorted()
converts the generator to a list which is kept in memory entirely. This means that it no longer scales very well. Something like itertools.combinations(range(1, 600), 400)
will inevitably produce a MemoryError
.
Is there a more memory-friendly way to achieve the desired result?
PS: I do realize that it would take ages to fully iterate over the last sequence I mentioned, but the tuple I am looking for should be very close to the start. And if I can count on the order, I can terminate early as in the first snippet.
回答1:
Here's how I'd solve it, with a recursive function that finds all combinations that sum to a given value:
def ordered_combinations(pop, n):
pop = sorted(pop)
for s in range(sum(pop[:n]), sum(pop[-n:])+1):
yield from get_sums(pop, s, n)
def get_sums(pop, s, n):
if n == 1:
if s in pop:
yield [s]
return
for i, v in enumerate(pop):
if sum(pop[i:i+n]) > s:
return
for rest in get_sums(pop[i+1:], s-v, n-1):
rest.append(v)
yield rest
Here's an example of it's output:
>>> for c in ordered_combinations(range(1, 8), 4):
print(c, sum(c))
[4, 3, 2, 1] 10
[5, 3, 2, 1] 11
[6, 3, 2, 1] 12
[5, 4, 2, 1] 12
[7, 3, 2, 1] 13
[6, 4, 2, 1] 13
[5, 4, 3, 1] 13
[7, 4, 2, 1] 14
[6, 5, 2, 1] 14
[6, 4, 3, 1] 14
[5, 4, 3, 2] 14
[7, 5, 2, 1] 15
[7, 4, 3, 1] 15
[6, 5, 3, 1] 15
[6, 4, 3, 2] 15
[7, 6, 2, 1] 16
[7, 5, 3, 1] 16
[6, 5, 4, 1] 16
[7, 4, 3, 2] 16
[6, 5, 3, 2] 16
[7, 6, 3, 1] 17
[7, 5, 4, 1] 17
[7, 5, 3, 2] 17
[6, 5, 4, 2] 17
[7, 6, 4, 1] 18
[7, 6, 3, 2] 18
[7, 5, 4, 2] 18
[6, 5, 4, 3] 18
[7, 6, 5, 1] 19
[7, 6, 4, 2] 19
[7, 5, 4, 3] 19
[7, 6, 5, 2] 20
[7, 6, 4, 3] 20
[7, 6, 5, 3] 21
[7, 6, 5, 4] 22
The combinations are always yielded with the biggest values first, as an artifact of how I'm building them as lists (by appending small values on the end, rather than by concatenating to the front). If you want them ordered from smallest to largest, you can change the rest.append(v); yield rest
lines to yield [v]+rest
.
The code uses the yield from
syntax that was introduced with Python 3.3. If you're using an earlier version that doesn't support that, you can use this equivalent code:
for v in get_sums(pop, s, n):
yield v
The code can even handle the extreme case you described of 400-combinations taken from an 800 member range. Here's the first twenty results of that computation (shown only with their largest 10 values, since the rest are all identically 390 down to 1), and their sums:
>>> for i, v in enumerate(ordered_combinations(range(1, 800), 400)):
if i >= 20:
break
print(v[:10], sum(v))
[400, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80200
[401, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80201
[402, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80202
[401, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80202
[403, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80203
[402, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80203
[401, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80203
[404, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[403, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[402, 401, 398, 397, 396, 395, 394, 393, 392, 391] 80204
[402, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80204
[401, 400, 399, 398, 396, 395, 394, 393, 392, 391] 80204
[405, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[404, 400, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[403, 401, 398, 397, 396, 395, 394, 393, 392, 391] 80205
[403, 400, 399, 397, 396, 395, 394, 393, 392, 391] 80205
[402, 401, 399, 397, 396, 395, 394, 393, 392, 391] 80205
[402, 400, 399, 398, 396, 395, 394, 393, 392, 391] 80205
[401, 400, 399, 398, 397, 395, 394, 393, 392, 391] 80205
[406, 399, 398, 397, 396, 395, 394, 393, 392, 391] 80206
Because it's recursive, this code may fail if you request an 1000-combination (this is due to Python's default recursion limit). You can modify the limit it with sys.setrecursionlimit
if necessary.
It may also have memory issues if you go exceedingly deep with an extremely large population, since get_sums
slices (and so copies) the population in the recursive step. If your use for this code will only be using range
s, you can probably fix the memory issue by removing the pop = sorted(pop)
line from ordered_combinations
, since Python 3's range
objects can be sliced efficiently (that is, range(1,100)[10:]
is range(11,100)
).
来源:https://stackoverflow.com/questions/14864867/how-do-i-iterate-over-a-large-number-of-tuples-of-integers-in-the-order-of-their