Python high memory usage with BeautifulSoup

后端 未结 4 1135
后悔当初
后悔当初 2020-12-19 03:24

I was trying to process several web pages with BeautifulSoup4 in python 2.7.3 but after every parse the memory usage goes up and up.

This simplified code produces th

4条回答
  •  一整个雨季
    2020-12-19 04:18

    Try Beautiful Soup's decompose functionality, which destroys the tree, when you're done working with each file.

    from bs4 import BeautifulSoup
    
    def parse():
        f = open("index.html", "r")
        page = BeautifulSoup(f.read(), "lxml")
        # page extraction goes here
        page.decompose()
        f.close()
    
    while True:
        parse()
        raw_input()
    

提交回复
热议问题