How to get first child table row from a table in BeautifulSoup ( Python )

橙三吉。 提交于 2020-12-06 02:57:22

问题


Here is the Code and sample results , I just want the first column of the table ignoring the rest. There are similar question on Stackoverflow but they did not help.

<tr>
<td>JOHNSON</td>
<td> 2,014,470 </td>
<td>0.81</td>
<td>2</td>
</tr>

I want JOHNSON only, as it is the first child. My python code is :

import requests
  from bs4 import BeautifulSoup
 def find_raw():
      url = 'http://names.mongabay.com/most_common_surnames.htm'
      r = requests.get(url)
      html = r.content
      soup = BeautifulSoup(html)
      for n in soup.find_all('tr'):
          print n.text
  
  find_raw()

What I get:

SMITH 2,501,922 1.0061
JOHNSON 2,014,470 0.812

回答1:


You can find all the tr tags with find_all, then for each tr you find (gives only the first) td. If it exists, you print it:

for tr in soup.find_all('tr'):
    td = tr.find('td')
    if td:
        print td



回答2:


Iter through tr, then print text of first td:

for tr in bs4.BeautifulSoup(data).select('tr'):
    try:
        print tr.select('td')[0].text
    except:
        pass

Or shorter:

>>> [tr.td for tr in bs4.BeautifulSoup(data).select('tr') if tr.td]
[<td>SMITH</td>, <td>JOHNSON</td>, <td>WILLIAMS</td>, <td>JONES</td>, ...]

Related posts:

  • Is there a clean way to get the n-th column of an html table using BeautifulSoup?
  • Extracting selected columns from a table using BeautifulSoup
  • CSS select with beautifulsoup4 doesn't work
  • Python BeautifulSoup Getting a column from table - IndexError List index out of range
  • BeautifulSoup Specify table column by number?


来源:https://stackoverflow.com/questions/31554704/how-to-get-first-child-table-row-from-a-table-in-beautifulsoup-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!