Extracting text from HTML file using Python

后端 未结 30 2615
一生所求
一生所求 2020-11-22 04:05

I\'d like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.

30条回答
  •  梦如初夏
    2020-11-22 04:29

    I recommend a Python Package called goose-extractor Goose will try to extract the following information:

    Main text of an article Main image of article Any Youtube/Vimeo movies embedded in article Meta Description Meta tags

    More :https://pypi.python.org/pypi/goose-extractor/

提交回复
热议问题