发表新帖

发表新帖

How to parse malformed HTML in python, using standard libraries

后端未结

关注

 6  721

不知归路 2020-12-08 04:36

There are so many html and xml libraries built into python, that it\'s hard to believe there\'s no support for real-world HTML parsing.

I\'ve found plenty of great t

6条回答

不思量自难忘° (楼主)

2020-12-08 04:46

I cannot think of any popular languages with a good, robust, heuristic HTML parsing library in its stdlib. Python certainly does not have one, which is something I think you know.

Why the requirement of a stdlib module? Most of the time when I hear people make that requirement, they are being silly. For most major tasks, you will need a third party module or to spend a whole lot of work re-implementing one. Introducing a dependency is a good thing, since that's work you didn't have to do.

So what you want is lxml.html. Ship lxml with your code if that's an issue, at which point it becomes functionally equivalent to writing it yourself except in difficulty, bugginess, and maintainability.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题