Best library to parse HTML with Python 3 and example?

前端 未结 6 1585
轮回少年
轮回少年 2020-12-24 12:50

I\'m new to Python completely and am using Python 3.1 on Windows (pywin). I need to parse some HTML, to essentially extra values between specific HTML tags and am confused a

6条回答
  •  南笙
    南笙 (楼主)
    2020-12-24 13:39

    If your HTML is well formed, you have many options, such as sax and dom. If it is not well formed you need a fault tolerant parser such as Beautiful soup, element tidy, or lxml's HTML parser. No parser is perfect, when presented with a variety of broken HTML, sometimes I have to try more then one. Lxml and Elementree use a mostly compatible api that is more of a standard than Beautiful soup.

    In my opinion, lxml is the best module for working with xml documents, but the ElementTree included with python is still pretty good. In the past I have used Beautiful soup to convert HTML to xml and construct ElementTree for processing the data.

提交回复
热议问题