Python html parsing that actually works

前端 未结 5 1332
轻奢々
轻奢々 2021-01-31 21:09

I\'m trying to parse some html in Python. There were some methods that actually worked before... but nowadays there\'s nothing I can actually use without workarounds.

5条回答
  •  南旧
    南旧 (楼主)
    2021-01-31 21:28

    I think the problem is that most HTML is ill-formed. XHTML tried to fix that, but it never really caught on enough - especially as most browsers do "intelligent workarounds" for ill-formed code.

    Even a few years ago I tried to parse HTML for a primitive spider-type app, and found the problems too difficult. I suspect writing your own might be on the cards, although we can't be the only people with this problem!

提交回复
热议问题