How to extract html table by using Beautifulsoup

不打扰是莪最后的温柔 提交于 2019-12-13 01:29:54

问题


Taking the below html snippet as example:

>>>soup
<table>
<tr><td class="abc">This is ABC</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>

<table>
<tr><td class="efg">This is EFG</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>

If I can only find my desire table by its table data class,

>>>soup.findAll("td",{"class":"abc"})
[<td class="abc">This is ABC</td>]

how can I extract the whole table as below?

<table>
<tr><td class="abc">This is ABC</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>

回答1:


Get the td tag's parent using find_parent():

soup.find("td", {"class":"abc"}).find_parent('table')

Demo:

>>> from bs4 import BeautifulSoup
>>> data = """
... <div>
...     <table>
...         <tr><td class="abc">This is ABC</td>
...         </tr>
...         <tr><td class="firstdata"> data1_xxx </td>
...         </tr>
...     </table>
... 
...     <table>
...         <tr><td class="efg">This is EFG</td>
...         </tr>
...         <tr><td class="firstdata"> data1_xxx </td>
...         </tr>
...     </table>
... </div>
... """
>>> soup = BeautifulSoup(data)
>>> print soup.find("td", {"class":"abc"}).find_parent('table')
<table>
<tr><td class="abc">This is ABC</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>


来源:https://stackoverflow.com/questions/23809640/how-to-extract-html-table-by-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!