Memory leak in adding list values

为君一笑 提交于 2019-12-24 00:28:31

问题


i'm new to python and have big memory issue. my script runs 24/7 and each day it allocates about 1gb more of my memory. i could narrow it down to this function:

Code:

#!/usr/bin/env python
# coding: utf8
import gc
from pympler import muppy
from pympler import summary
from pympler import tracker


v_list = [{ 
     'url_base' : 'http://www.immoscout24.de',
     'url_before_page' : '/Suche/S-T/P-',
     'url_after_page' : '/Wohnung-Kauf/Hamburg/Hamburg/-/-/50,00-/EURO--500000,00?pagerReporting=true',}]

# returns url
def get_url(v, page_num):
    return v['url_base'] + v['url_before_page'] + str(page_num) + v['url_after_page']


while True:
    gc.enable()

    for v_idx,v in enumerate(v_list):

        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)


        # magic happens here
        url = get_url(v, 1)


        # mem test ouput
        all_objects = muppy.get_objects()
        sum1 = summary.summarize(all_objects)
        summary.print_(sum1)

        # collects unlinked objects
        gc.collect()

Output:

======================== | =========== | ============
                    list |       26154 |     10.90 MB
                     str |       31202 |      1.90 MB
                    dict |         507 |    785.88 KB

expecially the list attribute is getting bigger and bigger each cycle around 600kb and i don't have an idea why. in my opinion i do not store anything here and the url variable should be overwritten each time. so basically there should be any memory consumption at all.

what am i missing here? :-)


回答1:


This "memory leak" is 100% caused by your testing for memory leaks. The all_objects list ends up maintaining a list of almost every object you ever created—even the ones you don't need anymore, which would have been cleaned up if they weren't in all_objects, but they are.

As a quick test:

  • If I run this code as-is, I get the list value growing by about 600KB/cycle, just as you say in your question, at least up to 20MB, where I killed it.

  • If I add del all_objects right after the sum1 = line, however, I get the list value bouncing back and forth between 100KB and 650KB.

If you think about why this is happening, it's pretty obvious in retrospect. At the point when you call muppy.get_objects() (except the first time), the previous value of all_objects is still alive. So, it's one of the objects that gets returned. That means that, even when you assign the return value to all_objects, you're not freeing the old value, you're just dropping its refcount from 2 to 1. Which keeps alive not just the old value itself, but every element within it—which, by definition, is everything that was alive last time through the loop.

If you can find a memory-exploring library that gives you weakrefs instead of normal references, that might help. Otherwise, make sure to do a del all_objects at some point before calling muppy.get_objects again. (Right after the only place you use it, the sum1 = line, seems like the most obvious place.)



来源:https://stackoverflow.com/questions/26554102/memory-leak-in-adding-list-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!