Scrapy simulate XHR request - returning 400

☆樱花仙子☆ 提交于 2021-02-08 06:59:22

问题


I'm trying to get data from a site using Ajax. The page loads and then Javascript requests the content. See this page for details: https://www.tele2.no/mobiltelefon.aspx

The problem is that when i try to simulate this process by calling this url: https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters

I get a 400 response telling me that the request is not allowed. This is my code:

# -*- coding: utf-8 -*-
import scrapy
import json

class Tele2Spider(scrapy.Spider):
    name = "tele2"
    #allowed_domains = ["tele2.no/mobiltelefon.aspx"]
    start_urls = (
        'https://www.tele2.no/mobiltelefon.aspx/',
    )

    def parse(self, response):
        url = 'https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters'
        my_data = "{filters: []}"
        req = scrapy.Request( url, method='POST', body=json.dumps(my_data), headers={'X-Requested-With': 'XMLHttpRequest','Content-Type':'application/json'}, callback=self.parser2)
        yield req

    def parser2(self, response):
      print "test"

I'm new to scrapy and python so there might be something obvious I'm missing


回答1:


The key problem is in missing quotes around the filters in the body:

url = 'https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters'
req = scrapy.Request(url,
                     method='POST',
                     body='{"filters": []}',
                     headers={'X-Requested-With': 'XMLHttpRequest',
                              'Content-Type': 'application/json; charset=UTF-8'},
                     callback=self.parser2)
yield req

Or, you can define it as a dictionary and then call json.dumps() to dump it to a string:

params = {"filters": []}
req = scrapy.Request(url,
                     method='POST',
                     body=json.dumps(params),
                     headers={'X-Requested-With': 'XMLHttpRequest',
                              'Content-Type': 'application/json; charset=UTF-8'},
                     callback=self.parser2)

As a proof, here is what it is giving me on the console:

2014-12-30 12:30:38-0500 [tele2] DEBUG: Crawled (200) <GET https://www.tele2.no/mobiltelefon.aspx/> (referer: None) 
2014-12-30 12:30:42-0500 [tele2] DEBUG: Crawled (200) <POST https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters> (referer: https://www.tele2.no/mobiltelefon.aspx/) 
test


来源:https://stackoverflow.com/questions/27709422/scrapy-simulate-xhr-request-returning-400

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!