I am trying to convert an html block to text using Python.
Input:
It's possible to use BeautifulSoup to remove unwanted scripts and similar, though you may need to experiment with a few different sites to make sure you've covered the different types of things you wish to exclude. Try this:
from requests import get
from bs4 import BeautifulSoup as BS
response = get('http://news.bbc.co.uk/2/hi/health/2284783.stm')
soup = BS(response.content, "html.parser")
for child in soup.body.children:
if child.name == 'script':
child.decompose()
print(soup.body.get_text())