问题
i'm new to python and have big memory issue. my script runs 24/7 and each day it allocates about 1gb more of my memory. i could narrow it down to this function:
Code:
#!/usr/bin/env python
# coding: utf8
import gc
from pympler import muppy
from pympler import summary
from pympler import tracker
v_list = [{
'url_base' : 'http://www.immoscout24.de',
'url_before_page' : '/Suche/S-T/P-',
'url_after_page' : '/Wohnung-Kauf/Hamburg/Hamburg/-/-/50,00-/EURO--500000,00?pagerReporting=true',}]
# returns url
def get_url(v, page_num):
return v['url_base'] + v['url_before_page'] + str(page_num) + v['url_after_page']
while True:
gc.enable()
for v_idx,v in enumerate(v_list):
# mem test ouput
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
# magic happens here
url = get_url(v, 1)
# mem test ouput
all_objects = muppy.get_objects()
sum1 = summary.summarize(all_objects)
summary.print_(sum1)
# collects unlinked objects
gc.collect()
Output:
======================== | =========== | ============
list | 26154 | 10.90 MB
str | 31202 | 1.90 MB
dict | 507 | 785.88 KB
expecially the list attribute is getting bigger and bigger each cycle around 600kb and i don't have an idea why. in my opinion i do not store anything here and the url variable should be overwritten each time. so basically there should be any memory consumption at all.
what am i missing here? :-)
回答1:
This "memory leak" is 100% caused by your testing for memory leaks. The all_objects list ends up maintaining a list of almost every object you ever created—even the ones you don't need anymore, which would have been cleaned up if they weren't in all_objects, but they are.
As a quick test:
If I run this code as-is, I get the
listvalue growing by about 600KB/cycle, just as you say in your question, at least up to 20MB, where I killed it.If I add
del all_objectsright after thesum1 =line, however, I get thelistvalue bouncing back and forth between 100KB and 650KB.
If you think about why this is happening, it's pretty obvious in retrospect. At the point when you call muppy.get_objects() (except the first time), the previous value of all_objects is still alive. So, it's one of the objects that gets returned. That means that, even when you assign the return value to all_objects, you're not freeing the old value, you're just dropping its refcount from 2 to 1. Which keeps alive not just the old value itself, but every element within it—which, by definition, is everything that was alive last time through the loop.
If you can find a memory-exploring library that gives you weakrefs instead of normal references, that might help. Otherwise, make sure to do a del all_objects at some point before calling muppy.get_objects again. (Right after the only place you use it, the sum1 = line, seems like the most obvious place.)
来源:https://stackoverflow.com/questions/26554102/memory-leak-in-adding-list-values