问题
A couple of days ago in Power BI, I was able to create a web query that allowed me to extract the JSON data from NBA Player Stats without using any headers. As of today, I have noticed that the query no longer works; I am getting the following error message:
DataSource.Error: The underlying connection was closed. An unexpected error occurred on a receive.
Details: https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=&Weight=
On a related note, I used to be able to pull the JSON data from NBA Team Stats using https://stats.nba.com/
as a Referer
header, but now it's giving me the same error message as shown above. To try and get around these errors, I have tried entering the following headers:
Host: stats.nba.com
Connection: keep-alive
Accept: application/json
x-nba-stats-token: true
User-Agent: Chrome/79.0.3945.130
x-nba-stats-origin: stats
Referer: https://stats.nba.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
When I do submit the query with the above headers, it comes back with the following error message:
Unable to connect
We encountered an error while trying to connect.
Details: "The 'Host' header must be modified using the appropriate property or method.
Parameter name: name"
I have run out of ideas as to how I'm able to properly run the query. I'm really new to web-scraping and HTML -- I've been trying to teach myself. Any help is greatly appreciated.
回答1:
All headers for GET request:
Host: stats.nba.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json, text/plain, */*
x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==
DNT: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US;q=0.9,en;q=0.7
URL:
https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=
Required Headers:
Accept: application/json, text/plain, */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1
Not sure if required:
x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==
Possible problems:
You detected as a bot and blocked
Header
X-NewRelic-ID
is a token (maybe with timeout). Probably it's assign using different params likeIP
, User-Agent and among others.
You can get freshX-NewRelic-ID
in HTML response with GET request tohttps://stats.nba.com/
. Here is a part from HTML withxpid
token:<script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQECWF5UChAHUlNTBwgBVw==",licenseKey:"09f0cb5c68",applicationID:"76210961"};
来源:https://stackoverflow.com/questions/59886998/what-headers-am-i-missing-to-scrape-the-nba-stats-data