问题
Taking the below html snippet as example:
>>>soup
<table>
<tr><td class="abc">This is ABC</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>
<table>
<tr><td class="efg">This is EFG</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>
If I can only find my desire table by its table data class,
>>>soup.findAll("td",{"class":"abc"})
[<td class="abc">This is ABC</td>]
how can I extract the whole table as below?
<table>
<tr><td class="abc">This is ABC</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>
回答1:
Get the td
tag's parent using find_parent():
soup.find("td", {"class":"abc"}).find_parent('table')
Demo:
>>> from bs4 import BeautifulSoup
>>> data = """
... <div>
... <table>
... <tr><td class="abc">This is ABC</td>
... </tr>
... <tr><td class="firstdata"> data1_xxx </td>
... </tr>
... </table>
...
... <table>
... <tr><td class="efg">This is EFG</td>
... </tr>
... <tr><td class="firstdata"> data1_xxx </td>
... </tr>
... </table>
... </div>
... """
>>> soup = BeautifulSoup(data)
>>> print soup.find("td", {"class":"abc"}).find_parent('table')
<table>
<tr><td class="abc">This is ABC</td>
</tr>
<tr><td class="firstdata"> data1_xxx </td>
</tr>
</table>
来源:https://stackoverflow.com/questions/23809640/how-to-extract-html-table-by-using-beautifulsoup