Python html parsing that actually works

前端未结

关注

 5  1332

轻奢々 2021-01-31 21:09

I\'m trying to parse some html in Python. There were some methods that actually worked before... but nowadays there\'s nothing I can actually use without workarounds.

5条回答

南旧 (楼主)

2021-01-31 21:28

I think the problem is that most HTML is ill-formed. XHTML tried to fix that, but it never really caught on enough - especially as most browsers do "intelligent workarounds" for ill-formed code.

Even a few years ago I tried to parse HTML for a primitive spider-type app, and found the problems too difficult. I suspect writing your own might be on the cards, although we can't be the only people with this problem!

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...