问题
I'm trying to parse the follow HTML code in python using beautiful soup. I would like to be able to search for text inside a tag, for example "Color" and return the text next tag "Slate, mykonos" and do so for the next tags so that for a give text category I can return it's corresponding information.
However, I'm finding it very difficult to find the right code to do this.
<h2>Details</h2>
<div class="section-inner">
<div class="_UCu">
<h3 class="_mEu">General</h3>
<div class="_JDu">
<span class="_IDu">Color</span>
<span class="_KDu">Slate, mykonos</span>
</div>
</div>
<div class="_UCu">
<h3 class="_mEu">Carrying Case</h3>
<div class="_JDu">
<span class="_IDu">Type</span>
<span class="_KDu">Protective cover</span>
</div>
<div class="_JDu">
<span class="_IDu">Recommended Use</span>
<span class="_KDu">For cell phone</span>
</div>
<div class="_JDu">
<span class="_IDu">Protection</span>
<span class="_KDu">Impact protection</span>
</div>
<div class="_JDu">
<span class="_IDu">Cover Type</span>
<span class="_KDu">Back cover</span>
</div>
<div class="_JDu">
<span class="_IDu">Features</span>
<span class="_KDu">Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges</span>
</div>
</div>
I use the following code to retrieve my div tag
soup.find_all("div", "_JDu")
Once I have retrieved the tag I can navigate inside it but I can't find the right code that will enable me to find the text inside one tag and return the text in the tag after it.
Any help would be really really appreciated as I'm new to python and I have hit a dead end.
回答1:
You can define a function to return the value for the key you enter:
def get_txt(soup, key):
key_tag = soup.find('span', text=key).parent
return key_tag.find_all('span')[1].text
color = get_txt(soup, 'Color')
print('Color: ' + color)
features = get_txt(soup, 'Features')
print('Features: ' + features)
Output:
Color: Slate, mykonos
Features: Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges
I hope this is what you are looking for.
Explanation:
soup.find('span', text=key) returns the <span> tag whose text=key.
.parent returns the parent tag of the current <span> tag.
Example:
When key='Color', soup.find('span', text=key).parent will return
<div class="_JDu">
<span class="_IDu">Color</span>
<span class="_KDu">Slate, mykonos</span>
</div>
Now we've stored this in key_tag. Only thing left is getting the text of second <span>, which is what the line key_tag.find_all('span')[1].text does.
回答2:
Give it a go. It can also give you the corresponding values. Make sure to wrap the html elements within content=""" """ variable between Triple Quotes to see how it works.
from bs4 import BeautifulSoup
soup = BeautifulSoup(content,"lxml")
for elem in soup.select("._JDu"):
item = elem.select_one("span")
if "Features" in item.text: #try to see if it misses the corresponding values
val = item.find_next("span").text
print(val)
来源:https://stackoverflow.com/questions/48200341/search-for-text-inside-a-tag-using-beautifulsoup-and-returning-the-text-in-the-t