Creating all combinations of a set and running out of memory

元气小坏坏 提交于 2019-12-11 09:43:06

问题


I need to generate every combination of 6 numbers from a set of 55. I believe there are 28,989,675 indexes in that set of combinations. I guess I'm running out of memory, because I can generate combinations with 4 numbers but nothing larger than that. How can I fix this problem?

I'm using a modification of some code I borrowed from a tutorial here: https://www.youtube.com/watch?v=VyXDQxuIwPU

import itertools

text_file = open("comb3.txt", "w")

harmonics = [28, 33, 36, 38, 40, 43, 45, 47, 48, 50, 52, 55, 55.86, 57, 59, 60, 60.86, 61.69, 62, 63.86, 64, 65.86, 66, 66.69, 67, 69, 69.69, 70.86, 71, 71.69, 72, 74, 75.86, 76, 76.69, 77.86, 79, 81, 81.69, 82.86, 83.69, 84, 84.86, 86, 88, 88.69, 89.86, 90.69, 91, 93, 95, 95.69, 96.86, 98, 100]

combos = itertools.combinations(harmonics, 4)

usable_combos = []
for e in combos:
    usable_combos.append(e)

print usable_combos

s = str(usable_combos)

text_file.write(s)
text_file.close()

Thanks!


回答1:


Iterators like itertools.combinations only generate a piece of data at a time which is relatively memory efficient. But when you put all of the values into a list you need memory to store all of them at once (btw, usable_combos = list(combos) would replace your for loop, not that you should do that).

Since you are writing them to a file, you can write each combo to the file as you get it from the iterator, not create a list. Now, do you need it to be formatted like the repr of a Python list? Because if not, this would make more sense:

for combo in combos:
    text_file.write(str(combo) + "\n")

Note: changed from using "{}\n".format(combo) due to profiling.

If you want it like the repr of the list, you'll need to write the [ and ] yourself, and commas instead of newlines.

-more-

Based on the updates in the comments - if you're looking for specific combinations, the best place to look for them is probably before writing them to the file, since otherwise you just have to load them back from the file and look at them all again. If you will be selecting a small fraction of the available combinations based on some criteria, selecting them up front will cut down your work later.

In general, you could also look into Cython for some more speed without having to learn actual C, and if you really want to brute-force something with memory requirements beyond your own computer's, appropriately sized EC2 instances are in the vicinity of 20 cents an hour.




回答2:


One reason why you're running out of memory is due to the fact that (as you quite rightly stated): 55 choose 6 = 28,989,675

Now, think about how much memory that is, exactly. We can perform a quick back-of-the-envelope calculation to estimate how much memory this will take:

Since your list is using floats and integers, we can deduce the upper bound for the memory consumption as being:

sys.getsizeof(float())

Which on a 64 bit machine is 24 bytes, and on a 32 bit machine is 16 bytes

And, since tuples take up: 56 + 8 * len(t) bytes (64 bit)

Hence, the upper bound of your calculation would take:

  1. 28,989,675 * 6 * 24 + 28,989,675 * (56 + 8 * 6) bytes ~ 6,856.39 MiB of memory (64 bit)
  2. 28,989,675 * 6 * 16 + 28,989,675 * (56 + 8 * 6) bytes ~ 5,529.34 MiB of memory (32 bit)

Recalling that Python lists are implemented contiguously (for O(1) lookup time), this is the probable reason why it crashes, as you also have to consider the memory occupied by the OS and other programs in RAM.

Compare this to the other example you cited: 55 choose 4 = 341,055 => ~ 59.85 MiB (64 bit) or ~49.44 MiB (32 bit) of contiguous memory. Since this is a very reasonable amount of memory that can be contiguous, it runs without problem.



来源:https://stackoverflow.com/questions/25356497/creating-all-combinations-of-a-set-and-running-out-of-memory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!