Converting html to text with Python

后端 未结 9 836
一生所求
一生所求 2020-12-12 17:49

I am trying to convert an html block to text using Python.

Input:

9条回答
  •  心在旅途
    2020-12-12 18:50

    It's possible to use BeautifulSoup to remove unwanted scripts and similar, though you may need to experiment with a few different sites to make sure you've covered the different types of things you wish to exclude. Try this:

    from requests import get
    from bs4 import BeautifulSoup as BS
    response = get('http://news.bbc.co.uk/2/hi/health/2284783.stm')
    soup = BS(response.content, "html.parser")
    for child in soup.body.children:
       if child.name == 'script':
           child.decompose() 
    print(soup.body.get_text())
    

提交回复
热议问题