How to extract text from span surrounded by div using beautifulsoup

混江龙づ霸主 提交于 2021-01-29 07:11:55

问题


I have a html snippet as below:

<div class="single_baby_name_description">
    <label>Meaning :</label> <span class="28816-meaning">the meaning of this name is universal whole.</span> </br>
    <label>Gender :</label> <span class="28816-gender">Girl</span> </br>
    <label>Religion :</label> <span class="28816-religion">Christianity</span> </br>
    <label>Origin :</label> <span class="28816-origin">German,French,Swedish</span> </br>
</div>

I attempt to extract text from all span inside div using

soup = BeautifulSoup(html,'html.parser')
spans=soup.select('div.single_baby_name_description>span') 

But spans[0].text gets only the text from the first tag . And spans[1].text occurs IndexError: list index out of range.

Any help would be greatly appreciated.


回答1:


I found out that only 'lxml' will do the job. For some reason 'html.parser' won't.

This will work:

soup = BeautifulSoup(html, 'lxml')
spans = soup.select('div.single_baby_name_description span')
spans = [span.text for span in spans]
print(spans)

Output:

['the meaning of this name is universal whole.', 'Girl', 'Christianity', 'German,French,Swedish']



回答2:


looking at the beautiful soup docs

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#beautifulsoup

selecting an attribute by tag name just returns the first one found as you’ve described. Have you tried:

Soup.find_all(‘span’)


来源:https://stackoverflow.com/questions/52656771/how-to-extract-text-from-span-surrounded-by-div-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!