Scraping data off of NBA.com

别来无恙 提交于 2019-12-03 21:51:12

You could do this:

require(rvest)
require(httr)
require(purrr)


ses <- html_session("http://stats.nba.com/team/#!/1610612742/",
                    user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"))
doc <- ses %>% jump_to("http://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2016-17&TeamID=1610612742")
res <- content(doc$response, "parsed")

res$resultSets[[1]]$rowSet %>% 
  map_df(~as.data.frame(t(.)))

#           V1   V2 V3                  V4 V5  V6   V7  V8           V9 V10 V11               V12     V13
#1  1610612742 2016 00     Justin Anderson  1 G-F  6-6 228 NOV 19, 1993  23   1          Virginia 1626147
#2  1610612742 2016 00          J.J. Barea  5   G  6-0 185 JUN 26, 1984  32  10      Northeastern  200826
#3  1610612742 2016 00        Andrew Bogut  6   C  7-0 260 NOV 28, 1984  32  11              Utah  101106

res$resultSets[[2]]$rowSet %>% 
  map_df(~as.data.frame(t(.)))

#          V1   V2        V3      V4        V5                V6                V7 V8                                     V9
#1 1610612742 2016 CAR107961    Rick  Carlisle     Rick Carlisle     rick_carlisle  1                             Head Coach
#2 1610612742 2016 HUN524472  Melvin      Hunt       Melvin Hunt       melvin_hunt  2                        Assistant Coach
#3 1610612742 2016 CAN081621   Kaleb   Canales     Kaleb Canales     kaleb_canales  2                        Assistant Coach

How did i find this:

I inspected all the XHR calls that the website made and found that it needs a session (thats why i create one using html_session) and set a user agent (not sure this is really required...) without the UA my request got stuck for >30 sec...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!