Best library to parse HTML with Python 3 and example?

前端未结

关注

 6  1585

轮回少年 2020-12-24 12:50

I\'m new to Python completely and am using Python 3.1 on Windows (pywin). I need to parse some HTML, to essentially extra values between specific HTML tags and am confused a

6条回答

南笙 (楼主)

2020-12-24 13:39

If your HTML is well formed, you have many options, such as sax and dom. If it is not well formed you need a fault tolerant parser such as Beautiful soup, element tidy, or lxml's HTML parser. No parser is perfect, when presented with a variety of broken HTML, sometimes I have to try more then one. Lxml and Elementree use a mostly compatible api that is more of a standard than Beautiful soup.

In my opinion, lxml is the best module for working with xml documents, but the ElementTree included with python is still pretty good. In the past I have used Beautiful soup to convert HTML to xml and construct ElementTree for processing the data.

0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...