I was trying to process several web pages with BeautifulSoup4 in python 2.7.3 but after every parse the memory usage goes up and up.
This simplified code produces th
Try Beautiful Soup's decompose functionality, which destroys the tree, when you're done working with each file.
from bs4 import BeautifulSoup
def parse():
f = open("index.html", "r")
page = BeautifulSoup(f.read(), "lxml")
# page extraction goes here
page.decompose()
f.close()
while True:
parse()
raw_input()