问题
I have this problem. There're several similar tables on the page I'm trying to scrape.
<h2 class="tabellen_ueberschrift al">Points</h2>
<div class="fl" style="width:49%;">
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
The only difference between them is the text within h2 tags, here: Points
How can I specifiy which table I need to search in?
I have this code and need to adjust the h2 tag factor:
my_tab = soup.find('table', {'class':'tabelle_grafik lh'})
Need some help guys.
回答1:
This works for me. Find the "previousSiblings" and if you find a h2 with the text "Points" before an h2 tag with a different text contents, you've found a good table
from BeautifulSoup import BeautifulSoup
t="""
<h2 class="tabellen_ueberschrift al">Points</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>yes me!</th></td></table>
<h2 class="tabellen_ueberschrift al">Bad</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>woo woo</td></th></table>
"""
soup = BeautifulSoup(t)
for ta in soup.findAll('table'):
for s in ta.findPreviousSiblings():
if s.name == u'h2':
if s.text == u'Points':
print ta
else:
break;
回答2:
Looks like this is a job for xpath. But, BeautifulSoup doesn't support XPath expressions.
Consider switching to lxml or scrapy.
FYI, for test xml like:
<html>
<h2 class="tabellen_ueberschrift al">Points</h2>
<div class="fl" style="width:49%;">
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">a</table>
</div>
<h2 class="tabellen_ueberschrift al">Illegal</h2>
<div class="fl" style="width:49%;">
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">b</table>
</div>
</html>
XPath expression to find table with class "tabelle_grafik lh" in div after h2="Points" is:
//table[@class="tabelle_grafik lh" and ../preceding-sibling::h2[1][text()="Points"]]
来源:https://stackoverflow.com/questions/15866297/matching-specific-table-within-html-beautifulsoup