XHR request URL says does not exist when attempting to parse it's content

混江龙づ霸主 提交于 2019-12-18 03:44:13

问题


Before I build a full solution to my problem using Scrapy I am posting a simplistic version of what I want to do:

import requests

url = 'http://www.whoscored.com/stageplayerstatfeed/?field=1&isAscending=false&orderBy=Rating&playerId=-1&stageId=9155&teamId=32"'

params = {'d': date.strftime('%Y%m'), 'isAggregate': 'false'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}

response = requests.get(url, params=params, headers=headers)

fixtures = response.body
#fixtures = literal_eval(response.content)
print fixtures 

This code is saying that the above URL does not exist. The URL relates to an XHR request that is submitted when you toggle from the 'Overall' to the 'Home' tab of the main table on this page:

http://www.whoscored.com/Teams/32/

If you activate XHR logging within the Console of Google Developer Tools you can see both the XHR request and the response sent from the server in the form of a dictionary (which is the expected format).

Can anyone tell me why the above code is not returning the data I would expect to see?

Thanks


回答1:


You have several problems:

  • the url should be http://www.whoscored.com/stageplayerstatfeed
  • wrong GET parameters
  • missing important required headers
  • you need response.json(), not response.body

The fixed version:

import requests

url = 'http://www.whoscored.com/stageplayerstatfeed'
params = {
    'field': '1',
    'isAscending': 'false',
    'orderBy': 'Rating',
    'playerId': '-1',
    'stageId': '9155',
    'teamId': '32'
}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36',
           'X-Requested-With': 'XMLHttpRequest',
           'Host': 'www.whoscored.com',
           'Referer': 'http://www.whoscored.com/Teams/32/'}

response = requests.get(url, params=params, headers=headers)

fixtures = response.json()
print fixtures

Prints:

[
    {
        u'AccurateCrosses': 0,
        u'AccurateLongBalls': 10,
        u'AccuratePasses': 89,
        u'AccurateThroughBalls': 0,
        u'AerialLost': 2,
        u'AerialWon': 4,
        ...
    },
    ...
]


来源:https://stackoverflow.com/questions/25654659/xhr-request-url-says-does-not-exist-when-attempting-to-parse-its-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!