How to debug Python memory fault?

老子叫甜甜 提交于 2019-12-05 16:39:58

问题


Edit: Really appreciate help in finding bug - but since it might prove hard to find/reproduce, any general debug help would be greatly appreciated too! Help me help myself! =)

Edit 2: Narrowing it down, commenting out code.

Edit 3: Seems lxml might not be the culprit, thanks! The full script is here. I need to go over it looking for references. What do they look like?

Edit 4: Actually, the scripts stops (goes 100%) in this, the parse_og part of it. So edit 3 is false - it must be lxml somehow.

Edit 5 MAJOR EDIT: As suggested by David Robinson and TankorSmash below, I've found a type of data content that will send lxml.etree.HTML( data ) in a wild loop. (I carelessly disregarded it, but find my sins redeemed as I've paid a price to the tune of an extra two days of debug! ;) A working crashing script is here. (Also opened a new question.)

Edit 6: Turns out this is a bug with lxml version 2.7.8 and below (at least). Updated to lxml 2.9.0, and bug is gone. Thanks also to the fine folks over at this follow-up question.

I don't know how to debug this weird problem I'm having. The below code runs fine for about five minutes, when the RAM is suddenly completely filled up (from 200MB to 1700MB during the 100% period - then when memory is full, it goes into blue wait state).

It's due to the code below, specifically the first two lines. That's for sure. But what is going on? What could possibly explain this behaviour?

def parse_og(self, data):
    """ lxml parsing to the bone! """
    try:
        tree = etree.HTML( data ) # << break occurs on this line >>
        m = tree.xpath("//meta[@property]")

        #for i in m:
        #   y = i.attrib['property']
        #   x = i.attrib['content']
        #   # self.rj[y] = x  # commented out in this example because code fails anyway


        tree = ''
        m = ''
        x = ''
        y = ''
        i = ''

        del tree
        del m
        del x
        del y
        del i

    except Exception:
        print 'lxml error: ', sys.exc_info()[1:3]
        print len(data)
        pass


回答1:


You can try Low-level Python debugging with GDB. Probably there is a bug in Python interpreter or in lxml library and it is hard to find it without extra tools.

You can interrupt your script running under gdb when CPU usage goes to 100% and look at stack trace. It will probably help to understand what's going on inside script.




回答2:


it must be due to some references which keep the documents alive. one must always be careful with string results from xpath evaluation. I see you have assigned None to tree and m but not to y,x and i .

Can you also assign None to y,x and i .




回答3:


Tools are also helpful when trying to track down memory problems. I've found guppy to be a very useful Python memory profiling and exploration tool.

It is not the easiest to get started with due to a lack of good tutorials / documentation, but once you get to grips with it you will find it very useful. Features I make use of:

  • Remote memory profiling (via sockets)
  • Basic GUI for charting usage, optionally showing live data
  • Powerful, and consistent, interfaces for exploring data usage in a Python shell


来源:https://stackoverflow.com/questions/15302027/how-to-debug-python-memory-fault

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!