问题
I have an html file which has a structure like the following:
<div>
</div
<div>
</div>
<div>
<div>
</div>
<div>
</div>
<div>
</div>
<div>
<div>
<div>
</div>
</div>
I would like to select all the siblings div without selecting nested div in the third and fourth block. If I use find_all() I get all the divs.
回答1:
You can find direct children of the parent element:
soup.select('body > div')
to get all div elements under the top-level body tag.
You could also find the first div, then grab all matching siblings with Element.find_next_siblings():
first_div = soup.find('div')
all_divs = [first_div] + first_div.find_next_siblings('div')
Or you could use the element.children generator and filter those:
all_divs = (elem for elem in top_level.children if getattr(elem, 'name', None) == 'div')
where top_level is the element containing these div elements directly.
来源:https://stackoverflow.com/questions/27826883/select-all-div-siblings-by-using-beautifulsoup