Search for text inside a tag using beautifulsoup and returning the text in the tag after it

前提是你 提交于 2019-12-24 14:28:37

问题


I'm trying to parse the follow HTML code in python using beautiful soup. I would like to be able to search for text inside a tag, for example "Color" and return the text next tag "Slate, mykonos" and do so for the next tags so that for a give text category I can return it's corresponding information.

However, I'm finding it very difficult to find the right code to do this.

<h2>Details</h2>
<div class="section-inner">
    <div class="_UCu">
        <h3 class="_mEu">General</h3>
        <div class="_JDu">
            <span class="_IDu">Color</span>
            <span class="_KDu">Slate, mykonos</span>
        </div>
    </div>
    <div class="_UCu">
        <h3 class="_mEu">Carrying Case</h3>
        <div class="_JDu">
            <span class="_IDu">Type</span>
            <span class="_KDu">Protective cover</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Recommended Use</span>
            <span class="_KDu">For cell phone</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Protection</span>
            <span class="_KDu">Impact protection</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Cover Type</span>
            <span class="_KDu">Back cover</span>
        </div>
        <div class="_JDu">
            <span class="_IDu">Features</span>
            <span class="_KDu">Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges</span>
        </div>
    </div>

I use the following code to retrieve my div tag

soup.find_all("div", "_JDu")

Once I have retrieved the tag I can navigate inside it but I can't find the right code that will enable me to find the text inside one tag and return the text in the tag after it.

Any help would be really really appreciated as I'm new to python and I have hit a dead end.


回答1:


You can define a function to return the value for the key you enter:

def get_txt(soup, key):
    key_tag = soup.find('span', text=key).parent
    return key_tag.find_all('span')[1].text

color = get_txt(soup, 'Color')
print('Color: ' + color)
features = get_txt(soup, 'Features')
print('Features: ' + features)

Output:

Color: Slate, mykonos
Features: Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges

I hope this is what you are looking for.

Explanation:

soup.find('span', text=key) returns the <span> tag whose text=key.

.parent returns the parent tag of the current <span> tag.

Example:

When key='Color', soup.find('span', text=key).parent will return

<div class="_JDu">
    <span class="_IDu">Color</span>
    <span class="_KDu">Slate, mykonos</span>
</div>

Now we've stored this in key_tag. Only thing left is getting the text of second <span>, which is what the line key_tag.find_all('span')[1].text does.




回答2:


Give it a go. It can also give you the corresponding values. Make sure to wrap the html elements within content=""" """ variable between Triple Quotes to see how it works.

from bs4 import BeautifulSoup

soup = BeautifulSoup(content,"lxml")
for elem in soup.select("._JDu"):
    item = elem.select_one("span")
    if "Features" in item.text:  #try to see if it misses the corresponding values
        val = item.find_next("span").text
        print(val)


来源:https://stackoverflow.com/questions/48200341/search-for-text-inside-a-tag-using-beautifulsoup-and-returning-the-text-in-the-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!