问题
In a Python code that iterates over a sequence of 30 problems involving memory- and CPU-intense numerical computations, I observe that the memory consumption of the Python process grows by ~800MB with the beginning of each of the 30 iterations and finally raises an MemoryError
in the 8th iteration (where the system's memory is in fact exhausted). However, if I import gc
and let gc.collect()
run after each iteration, then the memory consumption remains constant at ~2.5GB and the Python code terminates nicely after solving all 30 problems. The code only uses the data of 2 consecutive problems and there are no reference cycles (otherwise the manual garbage collection would also not be able to keep the memory consumption down).
The question
This behavior raises the question if Python tries to run the garbage collector before it raises an MemoryError
. In my opinion, this would be a perfectly sane thing to do but perhaps there are reasons against this?
A similar observation to the above was made here: https://stackoverflow.com/a/4319539/1219479
回答1:
Actually, there are reference cycles, and it's the only reason why the manual gc.collect()
calls are able to reclaim memory at all.
In Python (I'm assuming CPython here), the garbage collector's sole purpose is to break reference cycles. When none are present, objects are destroyed and their memory reclaimed at the exact moment the last reference to them is lost.
As for when the garbage collector is run, the full documentation is here: http://docs.python.org/2/library/gc.html
The TLDR of it is that Python maintains an internal counter of object allocations and deallocations. Whenever (allocations - deallocations)
reaches 700 (threshold 0), a garbage collection is run and both counters are reset.
Every time a collection happens (either automatic, or manually run with gc.collect()
), generation 0 (all objects that haven't yet survived a collection) is collected (that is, objects with no accessible references are walked through, looking for reference cycles -- if any are found, the cycles are broken, possibly leading to objects being destroyed because there are no references left). All objects that remain after that collection are moved to generation 1.
Every 10 collections (threshold 1), generation 1 is also collected, and all objects in generation 1 that survive that are moved to generation 2. Every 10 collections of generation 1 (that is, every 100 collections -- threshold 2), generation 2 is also collected. Objects that survive that are left in generation 2 -- there is no generation 3.
These 3 thresholds can be user-set by calling gc.set_threshold(threshold0, threshold1, threshold2)
.
What this all means for your program:
- The GC is not the mechanism CPython uses to reclaim memory (refcounting is). The GC breaks reference cycles in "dead" objects, which may lead to some of them being destroyed.
- No, there are no guarantees that the GC will run before a
MemoryError
is raised. - You have reference cycles. Try to get rid of them.
来源:https://stackoverflow.com/questions/22440421/python-is-the-garbage-collector-run-before-a-memoryerror-is-raised