Find all tables in html using BeautifulSoup

时光毁灭记忆、已成空白 提交于 2019-12-23 07:46:33

问题


I want to find all tables in html using BeautifulSoup. Inner tables should be included in outer tables.

I have created some code which works and it gives expected output. But, I don't like this solution, because it destroys 'soup' object.

Do you know how to do it in more elegant way ?

from BeautifulSoup import BeautifulSoup as bs

input = '''<html><head><title>title</title></head>
<body>
<p>paragraph</p>
<div><div>
    <table>table1<table>inner11<table>inner12</table></table></table>
    <div><table>table2<table>inner2</table></table></div>
</div></div>
<table>table3<table>inner3</table></table>
<table>table4<table>inner4</table></table>
</html>'''

soup = bs(input)
while(True):
    t=soup.find("table")
    if t is None:
        break
    print str(t)
    t.decompose()

Output:    
<table>table1<table>inner11<table>inner12</table></table></table>
<table>table2<table>inner2</table></table>
<table>table3<table>inner3</table></table>
<table>table4<table>inner4</table></table> 

回答1:


use soup.findAll("table") instead of find() and decompose() :

tables = soup.findAll("table")

for table in tables:
     if table.findParent("table") is None:
         print str(table)

output :

<table>table1<table>inner11<table>inner12</table></table></table>
<table>table2<table>inner2</table></table>
<table>table3<table>inner3</table></table>
<table>table4<table>inner4</table></table>

and nothing gets destroyed/destructed.



来源:https://stackoverflow.com/questions/9783579/find-all-tables-in-html-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!