Navigating to second string text using BeautifulSoup

随声附和 提交于 2019-12-13 02:35:22

问题


Here is the lxml, it's saved as sample.html.

<html> 
    <body> 
    <div class ="ecopyramid"> 
        <ul id ="producers"> 
            <li class ="producerlist"> 
                <div class ="name">A1</div> 
                <div class ="number">100000</div> 
            </li> 
            <li class ="producerlist"> 
                <div class ="name">B1</div> 
                <div class ="number">100000</div> 
            </li> 
        </ul> 
        <ul id ="primaryconsumers"> 
            <li class ="primaryconsumerlist"> 
                <div class ="name">A2</div> 
                <div class ="number">1000</div> 
            </li> 
            <li class ="primaryconsumerlist"> 
                <div class ="name">B2</div> 
                <div class ="number">2000</div> 
            </li> 
        </ul> 
        <ul id ="secondaryconsumers"> 
            <li class ="secondaryconsumerlist"> 
                <div class ="name">A3</div> 
                <div class ="number">100</div> 
            </li>

            <li class ="secondaryconsumerlist"> 
                <div class ="name">B3</div> 
                <div class ="number">98</div>
            </li> 
        </ul> 
        <ul id ="tertiaryconsumers"> 
            <li class ="tertiaryconsumerlist"> 
                <div class ="name">A4</div> 
                <div class ="number">80</div> 
            </li> 
            <li class ="tertiaryconsumerlist"> 
                <div class ="name">B4</div> 
                <div class ="number">50</div> 
            </li> 
        </ul> 
    </body> 
</html>

Here is the code to navigate through the sample.html above:

from bs4 import BeautifulSoup

with open("sample.html", "r") as sample_pyramid:
    soup=BeautifulSoup(sample_pyramid, "lxml")

soup_object = soup.find("ul", id="secondaryconsumers")
print soup_object.li.div.string

So in this code I am able to first specify the parent location of the text "A3" first by the tag "ul" and id "secondaryconsumers", then in the print command I specify further by the ".li.div.string" suffix and output the desired text of "A3". My questions are as follows:

1) How do I code in order to call/print the text "B3" in this example?

2) How do I code in order to call/print the text "98" (below "B3") in this example?

I have tried many things with no success, I am able to call the first text object through the navigation, but not the second text object within the shared tags.

Any thoughts?


回答1:


You can use CSS selectors to get names and numbers:

names = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.name')
numbers = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.number')

print [name.text for name in names]
print [number.text for number in numbers]

Prints:

[u'A3', u'B3']
[u'100', u'98']

Example code for the follow-up question in comments:

from bs4 import BeautifulSoup


data = """
<div class="span9">
    <table class="result-data table" border="0">
        <tbody>
        <tr class="result-item highlighting">
            <td class="result-category" scope="row">Name:</td>
            <td class="result-value-bold" colspan="4" itemprop="item">
                Robin Hood
            </td>
        </tr>
        </tbody>
    </table>
</div>
"""

soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)

prints Robin Hood.

Or, alternatively first find parent table and tr:

table = soup.find('table', class_='result-data')
tr = table.find('tr', class_='result-item')
print tr.find('td', class_="result-value-bold").get_text(strip=True)


来源:https://stackoverflow.com/questions/24923826/navigating-to-second-string-text-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!