scrapy-spider

crawlSpider seems not to follow rule

柔情痞子 提交于 2021-02-11 14:32:22
问题 here's my code. Actually I followed the example in "Recursively Scraping Web Pages With Scrapy" and it seems I have included a mistake somewhere. Can someone help me find it, please? It's driving me crazy, I only want all the results from all the result pages. Instead it gives me the results from page 1. Here's my code: import scrapy from scrapy.selector import Selector from scrapy.spiders import CrawlSpider, Rule from scrapy.http.request import Request from scrapy.contrib.linkextractors.sgml

Scrapy simulate XHR request - returning 400

扶醉桌前 提交于 2021-02-08 06:59:51
问题 I'm trying to get data from a site using Ajax. The page loads and then Javascript requests the content. See this page for details: https://www.tele2.no/mobiltelefon.aspx The problem is that when i try to simulate this process by calling this url: https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters I get a 400 response telling me that the request is not allowed. This is my code: # -*- coding: utf-8 -*- import scrapy import json class Tele2Spider(scrapy.Spider): name =

Scrapy simulate XHR request - returning 400

☆樱花仙子☆ 提交于 2021-02-08 06:59:22
问题 I'm trying to get data from a site using Ajax. The page loads and then Javascript requests the content. See this page for details: https://www.tele2.no/mobiltelefon.aspx The problem is that when i try to simulate this process by calling this url: https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters I get a 400 response telling me that the request is not allowed. This is my code: # -*- coding: utf-8 -*- import scrapy import json class Tele2Spider(scrapy.Spider): name =

Scrapy simulate XHR request - returning 400

梦想的初衷 提交于 2021-02-08 06:57:46
问题 I'm trying to get data from a site using Ajax. The page loads and then Javascript requests the content. See this page for details: https://www.tele2.no/mobiltelefon.aspx The problem is that when i try to simulate this process by calling this url: https://www.tele2.no/Services/Webshop/FilterService.svc/ApplyPhoneFilters I get a 400 response telling me that the request is not allowed. This is my code: # -*- coding: utf-8 -*- import scrapy import json class Tele2Spider(scrapy.Spider): name =

How to extract data from tags which are child of another tag through scrapy and python?

浪尽此生 提交于 2021-02-05 12:23:01
问题 This is the html code from which i want to extract data. But whenever i run i am getting some random values. Please can anyone help me out with this. I want to extract the following: Mumbai, Maharastra, 1958, government, UGC and Indian Institute of Technology, Bombay . HTML: <div class="instituteInfo"> <ul class="clg-info"> <li> <a href="link here" target="_blank">Mumbai</a>, <a href="link here" target="_blank">Maharashtra</a> </li> <li>Estd : <span>1958</span></li> <li>Ownership : <span

Scrapy shell works but actual script returns 404 error

淺唱寂寞╮ 提交于 2021-01-29 04:17:54
问题 scrapy shell http://www.zara.com/us Returns a correct 200 code 2017-01-05 18:34:20 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: zara) 2017-01-05 18:34:20 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'zara.spiders', 'ROBOTSTXT_OBEY': True, 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'SPIDER_MODULES': ['zara.spiders'], 'HTTPCACHE_ENABLED': True, 'BOT_NAME': 'zara', 'LOGSTATS_INTERVAL': 0, 'USER_AGENT': 'zara (+http://www.yourdomain.com)'} 2017-01-05 18

Scrapy throws an error when run using crawlerprocess

故事扮演 提交于 2020-12-12 05:37:07
问题 I've written a script in python using scrapy to collect the name of different posts and their links from a website. When I execute my script from command line it works flawlessly. Now, my intention is to run the script using CrawlerProcess() . I look for the similar problems in different places but nowhere I could find any direct solution or anything closer to that. However, when I try to run it as it is I get the following error: from stackoverflow.items import StackoverflowItem

Creating Scrapy array of items with multiple parse

和自甴很熟 提交于 2020-11-24 16:41:39
问题 I am scraping listings with Scrapy. My script parses first for the listing urls using parse_node , then it parses each listing using parse_listing , for each listing it parses the agents for the listing using parse_agent . I would like to create an array, that builds up as scrapy parses through the listings and the agents for the listings and resets for each new listing. Here is my parsing script: def parse_node(self,response,node): yield Request('LISTING LINK',callback=self.parse_listing)

Creating Scrapy array of items with multiple parse

余生颓废 提交于 2020-11-24 16:29:08
问题 I am scraping listings with Scrapy. My script parses first for the listing urls using parse_node , then it parses each listing using parse_listing , for each listing it parses the agents for the listing using parse_agent . I would like to create an array, that builds up as scrapy parses through the listings and the agents for the listings and resets for each new listing. Here is my parsing script: def parse_node(self,response,node): yield Request('LISTING LINK',callback=self.parse_listing)