Scrape table with BeautifulSoup

一世执手 提交于 2021-01-29 04:20:49

问题


I have a table structure that looks like this :

<tr><td>
<td>
<td bgcolor="#E6E6E6" valign="top" align="left">testtestestes</td>
</tr>
<tr nowrap="nowrap" valign="top" align="left">
<td nowrap="nowrap">8-K</td>
<td class="small">Current report, items 1.01, 3.02, and 9.01
<br>Accession Number: 0001283140-16-000129 &nbsp;Act: 34 &nbsp;Size:&nbsp;520 KB
</td>
<td nowrap="nowrap">2016-09-19<br>17:30:01</td>
 <td nowrap="nowrap">2016-09-19</td><td align="left" nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=001-03473&amp;owner=include&amp;count=100">001-03473</a>
<br/>161891888</td></tr>

That is one row of data. This is my script using beautifulSoup. I can get the <tr> and <td> just fine. But they are in a separate list.

for tr in (soup.find_all('tr')):
        tds = tr.find_all('td')
        print tds

My problem is how can I get the data from two separate <tr> and make it look like they're one row of data. I am trying to get the text between <td>


回答1:


If you want to pair them up, create an iterator from soup.find_all('tr') and zip them into pairs:

it = iter(soup.find_all('tr'))
for tr1, tr2  in zip(it, it):
        tds = tr1.find_all('td') + tr2.find_all("td")
        print(tds)

The equivalent with slicing would be to start with a different start pos and use a step of 2:

it = soup.find_all('tr')
for tr1, tr2  in zip(it[::2], it[1::2]):
        tds = tr1.find_all('td') + tr2.find_all("td")
        print(tds)

Using iter means you don't need to shallow copy the list.

Not sure how having an uneven amount of trs fits into the logic as there would be nothing to pair but if there is you can use izip_longest:

from itertools import izip_longest # python3 zip_longest

it = iter(soup.find_all('tr'))
for tr1, tr2  in izip_longest(it, it):
        tds = tr1.find_all('td') + tr2.find_all("td") if tr2 else []
        print(tds)


来源:https://stackoverflow.com/questions/39584154/scrape-table-with-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!