I\'m trying to parse some html in Python. There were some methods that actually worked before... but nowadays there\'s nothing I can actually use without workarounds.
I think the problem is that most HTML is ill-formed. XHTML tried to fix that, but it never really caught on enough - especially as most browsers do "intelligent workarounds" for ill-formed code.
Even a few years ago I tried to parse HTML for a primitive spider-type app, and found the problems too difficult. I suspect writing your own might be on the cards, although we can't be the only people with this problem!