Matching specific table within HTML, BeautifulSoup

时光总嘲笑我的痴心妄想 提交于 2019-12-24 03:31:20

问题


I have this problem. There're several similar tables on the page I'm trying to scrape.

<h2 class="tabellen_ueberschrift al">Points</h2>
<div class="fl" style="width:49%;">     
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">

The only difference between them is the text within h2 tags, here: Points

How can I specifiy which table I need to search in?

I have this code and need to adjust the h2 tag factor:

my_tab = soup.find('table', {'class':'tabelle_grafik lh'})

Need some help guys.


回答1:


This works for me. Find the "previousSiblings" and if you find a h2 with the text "Points" before an h2 tag with a different text contents, you've found a good table

from BeautifulSoup import BeautifulSoup

t="""
<h2 class="tabellen_ueberschrift al">Points</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>yes me!</th></td></table>
<h2 class="tabellen_ueberschrift al">Bad</h2>
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">
<th><td>woo woo</td></th></table>
"""

soup = BeautifulSoup(t)

for ta in soup.findAll('table'):
    for s in ta.findPreviousSiblings():
        if s.name == u'h2':
            if s.text == u'Points':
                print ta 
            else:
                break;



回答2:


Looks like this is a job for xpath. But, BeautifulSoup doesn't support XPath expressions.

Consider switching to lxml or scrapy.

FYI, for test xml like:

<html>
<h2 class="tabellen_ueberschrift al">Points</h2>  
<div class="fl" style="width:49%;">   
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">a</table>
</div>

<h2 class="tabellen_ueberschrift al">Illegal</h2>
<div class="fl" style="width:49%;">     
<table class="tabelle_grafik lh" cellpadding="2" cellspacing="1">b</table>
</div>
</html>

XPath expression to find table with class "tabelle_grafik lh" in div after h2="Points" is:

//table[@class="tabelle_grafik lh" and ../preceding-sibling::h2[1][text()="Points"]]


来源:https://stackoverflow.com/questions/15866297/matching-specific-table-within-html-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!