Scraping wrong table

你。 提交于 2021-02-08 10:46:31

问题


I'm trying to get the advanced stats of players onto an excel sheet but the table it's scraping is the first one instead of the advanced stats table.

ValueError: Length of passed values is 23, index implies 21

If i try to use the id instead, i get an another error about tbody.

Also, I get an error about

lname=name.split(" ")[1]
IndexError: list index out of range. 

I think that has to do with 'Nene' in the list. Is there a way to fix that?

import requests
from bs4 import BeautifulSoup
playernames=['Carlos Delfino',
'Yao Ming',
'Andris Biedrins',
'Nene']

for name in playernames:
  fname=name.split(" ")[0]
  lname=name.split(" ")[1]
  url="https://basketball.realgm.com/search?q={}+{}".format(fname,lname)
  response = requests.get(url)

  soup = BeautifulSoup(response.content, 'html.parser')
  table = soup.find('table', attrs={'class': 'tablesaw', 'data-tablesaw-mode-exclude': 'columntoggle'}).find_next('tbody')
  print(table)  

  columns = ['Season', 'Team', 'League', 'GP', 'GS', 'TS%', 'eFG%', 'ORB%', 'DRB%', 'TRB%', 'AST%', 'TOV%', 'STL%', 'BLK%', 'USG%', 'Total S%', 'PPR', 'PPS', 'ORtg', 'DRtg', 'PER']
  df = pd.DataFrame(columns=columns)

  trs = table.find_all('tr')
  for tr in trs:
    tds = tr.find_all('td')
    row = [td.text.replace('\n', '') for td in tds]
    df = df.append(pd.Series(row, index=columns), ignore_index=True)

df.to_csv('international players.csv', index=False) 

回答1:


Brazilians only use one name for soccer think Fred. If you want to use their moniker (Nene/Fred) then you need to implement exception handling for this, something like

try:
    lname=name.split(" ")[1]
except IndexError:
    lname=name

For your scraping issue, try using find_all as opposed to find, this will give you every data table on a given page and then you can pull the correct table out of the list

Change table = soup.find('table', attrs={'class': 'tablesaw', 'data-tablesaw-mode-exclude': 'columntoggle'}, {'id':'table-3554'}) to find_all

FYI also, the table ID's change every time you refresh the page so you can't use ID as a search mechanism.



来源:https://stackoverflow.com/questions/59866400/scraping-wrong-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!