Python Memory Issue with BeautifulSoup

你离开我真会死。 提交于 2020-01-16 06:06:18

问题


I've resolved this issue, but I'm wondering why it was caused in the first place. I used BeautifulSoup to identify this span from a webpage:

span = <span id="ctl00_ContentPlaceHolder1_RestInfoReskin_lblRestName">Ally's Sizzlers</span>

I then assign this variable:

restaurant.name = span.contents

However on each loop this takes up a full 1 MB, and there's about 20,000 loops. Through trial and error I came upon this solution:

restaurant.name = str(span.contents)

Can you tell me why the former span.contents takes up so much memory?


回答1:


Probably because str(span.contents) is calling the __str__ function inside the object span.contents and returning a smaller representation. You can use the pympler to measure the memory consumption




回答2:


Old stuff, but just in case other people wonder: span.contents returns a reference to a NavigableString instance. There is a link between this instance and the DOM tree, so that as long as this instance is in use, the whole DOM tree cannot be released from memory by the garbage collector. Thus, as long as restaurant.name is not released from memory, the whole DOM tree is kept in memory.

Using str(span.contents) returns a string which is not linked with the DOM tree, so it does not prevent the DOM tree from being released from memory.



来源:https://stackoverflow.com/questions/13481055/python-memory-issue-with-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!