BeautifulSoup return unexpected extra spaces

后端 未结 2 1509
野性不改
野性不改 2020-12-03 10:48

I am trying to grab some text from html documents with BeautifulSoup. In a very relavant case for me, it originates a strange and interesting result: after a certain point,

2条回答
  •  半阙折子戏
    2020-12-03 11:26

    You can specify the parser as html.parser:

    soup = BeautifulSoup(prova, 'html.parser')
    

    Also you can specify the html5 parser:

    soup = BeautifulSoup(prova, 'html5')
    

    Haven't installed the html5 parser yet? Install it from terminal:

    sudo apt-get install python-html5lib
    

    The xml parser may be used (soup = BeautifulSoup(prova, 'xml')) but you may see some differences in multi-valued attributes like class="foo bar".

提交回复
热议问题