Python high memory usage with BeautifulSoup

后端 未结 4 1133
后悔当初
后悔当初 2020-12-19 03:24

I was trying to process several web pages with BeautifulSoup4 in python 2.7.3 but after every parse the memory usage goes up and up.

This simplified code produces th

4条回答
  •  攒了一身酷
    2020-12-19 04:13

    I know this is an old thread, but there's one more thing to keep in mind when parsing pages with beautifulsoup. When navigating a tree, and you are storing a specific value, be sure to get the string and not a bs4 object. For instance this caused a memory leak when used in a loop:

    category_name = table_data.find('a').contents[0]
    

    Which could be fixed by changing in into:

    category_name = str(table_data.find('a').contents[0])
    

    In the first example the type of category name is bs4.element.NavigableString

提交回复
热议问题