问题
Here is the lxml, it's saved as sample.html.
<html>
<body>
<div class ="ecopyramid">
<ul id ="producers">
<li class ="producerlist">
<div class ="name">A1</div>
<div class ="number">100000</div>
</li>
<li class ="producerlist">
<div class ="name">B1</div>
<div class ="number">100000</div>
</li>
</ul>
<ul id ="primaryconsumers">
<li class ="primaryconsumerlist">
<div class ="name">A2</div>
<div class ="number">1000</div>
</li>
<li class ="primaryconsumerlist">
<div class ="name">B2</div>
<div class ="number">2000</div>
</li>
</ul>
<ul id ="secondaryconsumers">
<li class ="secondaryconsumerlist">
<div class ="name">A3</div>
<div class ="number">100</div>
</li>
<li class ="secondaryconsumerlist">
<div class ="name">B3</div>
<div class ="number">98</div>
</li>
</ul>
<ul id ="tertiaryconsumers">
<li class ="tertiaryconsumerlist">
<div class ="name">A4</div>
<div class ="number">80</div>
</li>
<li class ="tertiaryconsumerlist">
<div class ="name">B4</div>
<div class ="number">50</div>
</li>
</ul>
</body>
</html>
Here is the code to navigate through the sample.html above:
from bs4 import BeautifulSoup
with open("sample.html", "r") as sample_pyramid:
soup=BeautifulSoup(sample_pyramid, "lxml")
soup_object = soup.find("ul", id="secondaryconsumers")
print soup_object.li.div.string
So in this code I am able to first specify the parent location of the text "A3" first by the tag "ul" and id "secondaryconsumers", then in the print command I specify further by the ".li.div.string" suffix and output the desired text of "A3". My questions are as follows:
1) How do I code in order to call/print the text "B3" in this example?
2) How do I code in order to call/print the text "98" (below "B3") in this example?
I have tried many things with no success, I am able to call the first text object through the navigation, but not the second text object within the shared tags.
Any thoughts?
回答1:
You can use CSS selectors to get names and numbers:
names = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.name')
numbers = soup.select('ul#secondaryconsumers > li.secondaryconsumerlist > div.number')
print [name.text for name in names]
print [number.text for number in numbers]
Prints:
[u'A3', u'B3']
[u'100', u'98']
Example code for the follow-up question in comments:
from bs4 import BeautifulSoup
data = """
<div class="span9">
<table class="result-data table" border="0">
<tbody>
<tr class="result-item highlighting">
<td class="result-category" scope="row">Name:</td>
<td class="result-value-bold" colspan="4" itemprop="item">
Robin Hood
</td>
</tr>
</tbody>
</table>
</div>
"""
soup = BeautifulSoup(data)
print soup.find('td', class_="result-value-bold").get_text(strip=True)
prints Robin Hood
.
Or, alternatively first find parent table
and tr
:
table = soup.find('table', class_='result-data')
tr = table.find('tr', class_='result-item')
print tr.find('td', class_="result-value-bold").get_text(strip=True)
来源:https://stackoverflow.com/questions/24923826/navigating-to-second-string-text-using-beautifulsoup