There are so many html and xml libraries built into python, that it\'s hard to believe there\'s no support for real-world HTML parsing.
I\'ve found plenty of great t
As already stated, there is currently no satisfying solution only with standardlib. I had faced the same problem as you, when I tried to run one of my programs on an outdated hosting environment without the possibility to install own extensions and only python2.6. Solution:
Grab this file and the latest stable BeautifulSoup version of the 3er series (3.2.1 as of now). From the tar-file there, only pick BeautifulSoup.py
, it's the only one that you really need to ship with your code. So you have these two files in your path, all you need to do then, to get a casual etree
object from some HTML string, like you would get it from lxml, is this:
from StringIO import StringIO
import ElementSoup
tree = ElementSoup.parse(StringIO(input_str))
lxml itself and html5lib both require you, to compile some C-code in order to make it run. It is considerably more effort to get them working, and if your environment is restricted, or your intended audience not willing to do that, avoid them.