问题
I have a html snippet as below:
<div class="single_baby_name_description">
<label>Meaning :</label> <span class="28816-meaning">the meaning of this name is universal whole.</span> </br>
<label>Gender :</label> <span class="28816-gender">Girl</span> </br>
<label>Religion :</label> <span class="28816-religion">Christianity</span> </br>
<label>Origin :</label> <span class="28816-origin">German,French,Swedish</span> </br>
</div>
I attempt to extract text from all span inside div using
soup = BeautifulSoup(html,'html.parser')
spans=soup.select('div.single_baby_name_description>span')
But spans[0].text gets only the text from the first tag . And spans[1].text occurs IndexError: list index out of range.
Any help would be greatly appreciated.
回答1:
I found out that only 'lxml' will do the job. For some reason 'html.parser' won't.
This will work:
soup = BeautifulSoup(html, 'lxml')
spans = soup.select('div.single_baby_name_description span')
spans = [span.text for span in spans]
print(spans)
Output:
['the meaning of this name is universal whole.', 'Girl', 'Christianity', 'German,French,Swedish']
回答2:
looking at the beautiful soup docs
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#beautifulsoup
selecting an attribute by tag name just returns the first one found as you’ve described. Have you tried:
Soup.find_all(‘span’)
来源:https://stackoverflow.com/questions/52656771/how-to-extract-text-from-span-surrounded-by-div-using-beautifulsoup