I have a small multithreaded script running in django and over time its starts using more and more memory. Leaving it for a full day eats about 6GB of RAM and I start to swa
Try Guppy.
Basicly, you need more information or be able to extract some. Guppy even provides graphical representation of data.
See this excellent blog post from Ned Batchelder on how they traced down real memory leak in HP's Tabblo. A classic and worth reading.
Have you tried gc.set_debug() ?
You need to ask yourself simple questions:
__del__
methods? Do I absolutely, unequivocally, need them?See, the main issue would be a cycle of objects containing __del__
methods:
import gc
class A(object):
def __del__(self):
print 'a deleted'
if hasattr(self, 'b'):
delattr(self, 'b')
class B(object):
def __init__(self, a):
self.a = a
def __del__(self):
print 'b deleted'
del self.a
def createcycle():
a = A()
b = B(a)
a.b = b
return a, b
gc.set_debug(gc.DEBUG_LEAK)
a, b = createcycle()
# remove references
del a, b
# prints:
## gc: uncollectable <A 0x...>
## gc: uncollectable <B 0x...>
## gc: uncollectable <dict 0x...>
## gc: uncollectable <dict 0x...>
gc.collect()
# to solve this we break explicitely the cycles:
a, b = createcycle()
del a.b
del a, b
# objects are removed correctly:
## a deleted
## b deleted
gc.collect()
I would really encourage you to flag objects / concepts that are cycling in your application and focus on their lifetime: when you don't need them anymore, do we have anything referencing it?
Even for cycles without __del__
methods, we can have an issue:
import gc
# class without destructor
class A(object): pass
def createcycle():
# a -> b -> c
# ^ |
# ^<--<--<--|
a = A()
b = A()
a.next = b
c = A()
b.next = c
c.next = a
return a, b, b
gc.set_debug(gc.DEBUG_LEAK)
a, b, c = createcycle()
# since we have no __del__ methods, gc is able to collect the cycle:
del a, b, c
# no panic message, everything is collectable:
##gc: collectable <A 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <A 0x...>
##gc: collectable <dict 0x...>
##gc: collectable <dict 0x...>
gc.collect()
a, b, c = createcycle()
# but as long as we keep an exterior ref to the cycle...:
seen = dict()
seen[a] = True
# delete the cycle
del a, b, c
# nothing is collected
gc.collect()
If you have to use "seen"-like dictionaries, or history, be careful that you keep only the actual data you need, and no external references to it.
I'm a bit disappointed now by set_debug
, I wish it could be configured to output data somewhere else than to stderr, but hopefully that should change soon.
Is DEBUG=False in settings.py?
If not Django will happily store all the SQL queries you make which adds up.
I think you should use different tools. Apparently, the statistics you got is only about GC objects (i.e. objects which may participate in cycles); most notably, it lacks strings.
I recommend to use Pympler; this should provide you with more detailed statistics.
See http://opensourcehacker.com/2008/03/07/debugging-django-memory-leak-with-trackrefs-and-guppy/ . Short answer: if you're running django but not in a web-request-based format, you need to manually run db.reset_queries()
(and of course have DEBUG=False, as others have mentioned). Django automatically does reset_queries()
after a web request, but in your format, that never happens.