I\'m writing some spider in python and use lxml library for parsing html and gevent library for async. I found that after sometime of work lxml parser starts eats memory up
It seems the issue stems from the library lxml relies on: libxml2 which is written in C language. Here is the first report: http://codespeak.net/pipermail/lxml-dev/2010-December/005784.html This bug hasn't been mentioned either in lxml v2.3 bug fix logs or in libxml2 change logs.
Oh, there is followup mails here: https://bugs.launchpad.net/lxml/+bug/728924
Well, I tried to reproduce the issue, but get nothing abnormal. Guys who can reproduce it may help to clarify the problem.