Python: Memory leak debugging

前端 未结 7 1373
误落风尘
误落风尘 2020-12-07 16:45

I have a small multithreaded script running in django and over time its starts using more and more memory. Leaving it for a full day eats about 6GB of RAM and I start to swa

相关标签:
7条回答
  • 2020-12-07 17:28

    Try Guppy.

    Basicly, you need more information or be able to extract some. Guppy even provides graphical representation of data.

    0 讨论(0)
  • 2020-12-07 17:29

    See this excellent blog post from Ned Batchelder on how they traced down real memory leak in HP's Tabblo. A classic and worth reading.

    0 讨论(0)
  • 2020-12-07 17:35

    Have you tried gc.set_debug() ?

    You need to ask yourself simple questions:

    • Am I using objects with __del__ methods? Do I absolutely, unequivocally, need them?
    • Can I get reference cycles in my code? Can't we break these circles before getting rid of the objects?

    See, the main issue would be a cycle of objects containing __del__ methods:

    import gc
    
    class A(object):
        def __del__(self):
            print 'a deleted'
            if hasattr(self, 'b'):
                delattr(self, 'b')
    
    class B(object):
        def __init__(self, a):
            self.a = a
        def __del__(self):
            print 'b deleted'
            del self.a
    
    
    def createcycle():
        a = A()
        b = B(a)
        a.b = b
        return a, b
    
    gc.set_debug(gc.DEBUG_LEAK)
    
    a, b = createcycle()
    
    # remove references
    del a, b
    
    # prints:
    ## gc: uncollectable <A 0x...>
    ## gc: uncollectable <B 0x...>
    ## gc: uncollectable <dict 0x...>
    ## gc: uncollectable <dict 0x...>
    gc.collect()
    
    # to solve this we break explicitely the cycles:
    a, b = createcycle()
    del a.b
    
    del a, b
    
    # objects are removed correctly:
    ## a deleted
    ## b deleted
    gc.collect()
    

    I would really encourage you to flag objects / concepts that are cycling in your application and focus on their lifetime: when you don't need them anymore, do we have anything referencing it?

    Even for cycles without __del__ methods, we can have an issue:

    import gc
    
    # class without destructor
    class A(object): pass
    
    def createcycle():
        # a -> b -> c 
        # ^         |
        # ^<--<--<--|
        a = A()
        b = A()
        a.next = b
        c = A()
        b.next = c
        c.next = a
        return a, b, b
    
    gc.set_debug(gc.DEBUG_LEAK)
    
    a, b, c = createcycle()
    # since we have no __del__ methods, gc is able to collect the cycle:
    
    del a, b, c
    # no panic message, everything is collectable:
    ##gc: collectable <A 0x...>
    ##gc: collectable <A 0x...>
    ##gc: collectable <dict 0x...>
    ##gc: collectable <A 0x...>
    ##gc: collectable <dict 0x...>
    ##gc: collectable <dict 0x...>
    gc.collect()
    
    a, b, c = createcycle()
    
    # but as long as we keep an exterior ref to the cycle...:
    seen = dict()
    seen[a] = True
    
    # delete the cycle
    del a, b, c
    # nothing is collected
    gc.collect()
    

    If you have to use "seen"-like dictionaries, or history, be careful that you keep only the actual data you need, and no external references to it.

    I'm a bit disappointed now by set_debug, I wish it could be configured to output data somewhere else than to stderr, but hopefully that should change soon.

    0 讨论(0)
  • 2020-12-07 17:39

    Is DEBUG=False in settings.py?

    If not Django will happily store all the SQL queries you make which adds up.

    0 讨论(0)
  • 2020-12-07 17:40

    I think you should use different tools. Apparently, the statistics you got is only about GC objects (i.e. objects which may participate in cycles); most notably, it lacks strings.

    I recommend to use Pympler; this should provide you with more detailed statistics.

    0 讨论(0)
  • 2020-12-07 17:44

    See http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/ . Short answer: if you're running django but not in a web-request-based format, you need to manually run db.reset_queries() (and of course have DEBUG=False, as others have mentioned). Django automatically does reset_queries() after a web request, but in your format, that never happens.

    0 讨论(0)
提交回复
热议问题