问题
I need to generate every combination of 6 numbers from a set of 55. I believe there are 28,989,675 indexes in that set of combinations. I guess I'm running out of memory, because I can generate combinations with 4 numbers but nothing larger than that. How can I fix this problem?
I'm using a modification of some code I borrowed from a tutorial here: https://www.youtube.com/watch?v=VyXDQxuIwPU
import itertools
text_file = open("comb3.txt", "w")
harmonics = [28, 33, 36, 38, 40, 43, 45, 47, 48, 50, 52, 55, 55.86, 57, 59, 60, 60.86, 61.69, 62, 63.86, 64, 65.86, 66, 66.69, 67, 69, 69.69, 70.86, 71, 71.69, 72, 74, 75.86, 76, 76.69, 77.86, 79, 81, 81.69, 82.86, 83.69, 84, 84.86, 86, 88, 88.69, 89.86, 90.69, 91, 93, 95, 95.69, 96.86, 98, 100]
combos = itertools.combinations(harmonics, 4)
usable_combos = []
for e in combos:
usable_combos.append(e)
print usable_combos
s = str(usable_combos)
text_file.write(s)
text_file.close()
Thanks!
回答1:
Iterators like itertools.combinations
only generate a piece of data at a time which is relatively memory efficient. But when you put all of the values into a list you need memory to store all of them at once (btw, usable_combos = list(combos)
would replace your for
loop, not that you should do that).
Since you are writing them to a file, you can write each combo to the file as you get it from the iterator, not create a list. Now, do you need it to be formatted like the repr
of a Python list? Because if not, this would make more sense:
for combo in combos:
text_file.write(str(combo) + "\n")
Note: changed from using "{}\n".format(combo)
due to profiling.
If you want it like the repr
of the list, you'll need to write the [
and ]
yourself, and commas instead of newlines.
-more-
Based on the updates in the comments - if you're looking for specific combinations, the best place to look for them is probably before writing them to the file, since otherwise you just have to load them back from the file and look at them all again. If you will be selecting a small fraction of the available combinations based on some criteria, selecting them up front will cut down your work later.
In general, you could also look into Cython for some more speed without having to learn actual C, and if you really want to brute-force something with memory requirements beyond your own computer's, appropriately sized EC2 instances are in the vicinity of 20 cents an hour.
回答2:
One reason why you're running out of memory is due to the fact that (as you quite rightly stated): 55 choose 6 = 28,989,675
Now, think about how much memory that is, exactly. We can perform a quick back-of-the-envelope calculation to estimate how much memory this will take:
Since your list is using floats and integers, we can deduce the upper bound for the memory consumption as being:
sys.getsizeof(float())
Which on a 64 bit machine is 24 bytes, and on a 32 bit machine is 16 bytes
And, since tuples take up: 56 + 8 * len(t) bytes (64 bit)
Hence, the upper bound of your calculation would take:
28,989,675 * 6 * 24 + 28,989,675 * (56 + 8 * 6) bytes ~ 6,856.39 MiB
of memory (64 bit)28,989,675 * 6 * 16 + 28,989,675 * (56 + 8 * 6) bytes ~ 5,529.34 MiB
of memory (32 bit)
Recalling that Python lists are implemented contiguously (for O(1) lookup time), this is the probable reason why it crashes, as you also have to consider the memory occupied by the OS and other programs in RAM.
Compare this to the other example you cited: 55 choose 4 = 341,055 => ~ 59.85 MiB (64 bit)
or ~49.44 MiB (32 bit)
of contiguous memory. Since this is a very reasonable amount of memory that can be contiguous, it runs without problem.
来源:https://stackoverflow.com/questions/25356497/creating-all-combinations-of-a-set-and-running-out-of-memory