Python html parsing that actually works

前端 未结 5 1333
轻奢々
轻奢々 2021-01-31 21:09

I\'m trying to parse some html in Python. There were some methods that actually worked before... but nowadays there\'s nothing I can actually use without workarounds.

5条回答
  •  误落风尘
    2021-01-31 21:29

    html5lib cannot parse half of what's "out there"

    That sounds extremely implausible. html5lib uses exactly the same algorithm that's also implemented in recent versions of Firefox, Safari and Chrome. If that algorithm broke half the web, I think we would have heard. If you have particular problems with it, do file bugs.

提交回复
热议问题