I am trying to grab some text from html documents with BeautifulSoup. In a very relavant case for me, it originates a strange and interesting result: after a certain point,
You can specify the parser as html.parser:
soup = BeautifulSoup(prova, 'html.parser')
Also you can specify the html5 parser:
soup = BeautifulSoup(prova, 'html5')
Haven't installed the html5 parser yet? Install it from terminal:
sudo apt-get install python-html5lib
The xml parser may be used (soup = BeautifulSoup(prova, 'xml')) but you may see some differences in multi-valued attributes like class="foo bar".