I was trying to process several web pages with BeautifulSoup4 in python 2.7.3 but after every parse the memory usage goes up and up.
This simplified code produces th
I know this is an old thread, but there's one more thing to keep in mind when parsing pages with beautifulsoup. When navigating a tree, and you are storing a specific value, be sure to get the string and not a bs4 object. For instance this caused a memory leak when used in a loop:
category_name = table_data.find('a').contents[0]
Which could be fixed by changing in into:
category_name = str(table_data.find('a').contents[0])
In the first example the type of category name is bs4.element.NavigableString