What headers am I missing to scrape the NBA Stats data?

不羁的心 提交于 2020-02-03 02:14:10

问题


A couple of days ago in Power BI, I was able to create a web query that allowed me to extract the JSON data from NBA Player Stats without using any headers. As of today, I have noticed that the query no longer works; I am getting the following error message:

DataSource.Error: The underlying connection was closed. An unexpected error occurred on a receive.
Details: https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=&Weight=

On a related note, I used to be able to pull the JSON data from NBA Team Stats using https://stats.nba.com/ as a Referer header, but now it's giving me the same error message as shown above. To try and get around these errors, I have tried entering the following headers:

Host: stats.nba.com
Connection: keep-alive
Accept: application/json
x-nba-stats-token: true
User-Agent: Chrome/79.0.3945.130
x-nba-stats-origin: stats
Referer: https://stats.nba.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

When I do submit the query with the above headers, it comes back with the following error message:

Unable to connect

We encountered an error while trying to connect.

Details: "The 'Host' header must be modified using the appropriate property or method.
Parameter name: name"

I have run out of ideas as to how I'm able to properly run the query. I'm really new to web-scraping and HTML -- I've been trying to teach myself. Any help is greatly appreciated.


回答1:


All headers for GET request:

Host: stats.nba.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json, text/plain, */*
x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==
DNT: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US;q=0.9,en;q=0.7

URL:

https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=

Required Headers:

Accept: application/json, text/plain, */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1

Not sure if required:

x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==

Possible problems:

  1. You detected as a bot and blocked

  2. Header X-NewRelic-ID is a token (maybe with timeout). Probably it's assign using different params like IP, User-Agent and among others.
    You can get fresh X-NewRelic-ID in HTML response with GET request to https://stats.nba.com/. Here is a part from HTML with xpid token: <script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQECWF5UChAHUlNTBwgBVw==",licenseKey:"09f0cb5c68",applicationID:"76210961"};



来源:https://stackoverflow.com/questions/59886998/what-headers-am-i-missing-to-scrape-the-nba-stats-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!