问题
I'm struggling to find a simple to solve this problem and hope you might be able to help.
I've been using Beautifulsoup's find all and trying some regex to find all the items except the 'emptyLine' line in the html below:
<div class="product_item0 ">...</div>
<div class="product_item1 ">...</div>
<div class="product_item2 ">...</div>
<div class="product_item0 ">...</div>
<div class="product_item1 ">...</div>
<div class="product_item2 ">...</div>
<div class="product_item0 ">...</div>
<div class="product_item1 last">...</div>
<div class="product_item2 emptyItem">...</div>
Is there a simple way to find all the items except one including the 'emptyItem'?
回答1:
Just skip elements containing the emptyItem
class. Working sample:
from bs4 import BeautifulSoup
data = """
<div>
<div class="product_item0">test0</div>
<div class="product_item1">test1</div>
<div class="product_item2">test2</div>
<div class="product_item2 emptyItem">empty</div>
</div>
"""
soup = BeautifulSoup(data, "html.parser")
for elm in soup.select("div[class^=product_item]"):
if "emptyItem" in elm["class"]: # skip elements having emptyItem class
continue
print(elm.get_text())
Prints:
test0
test1
test2
Note that the div[class^=product_item]
is a CSS selector that would match all div
elements with a class starting with product_item
.
来源:https://stackoverflow.com/questions/35115417/python-beautifulsoup-find-all-except