scrapy-spider

How can Scrapy deal with Javascript

我只是一个虾纸丫 提交于 2020-06-24 13:50:50
问题 Spider for reference: import scrapy from scrapy.spiders import Spider from scrapy.selector import Selector from script.items import ScriptItem class RunSpider(scrapy.Spider): name = "run" allowed_domains = ["stopitrightnow.com"] start_urls = ( 'http://www.stopitrightnow.com/', ) def parse(self, response): for widget in response.xpath('//div[@class="shopthepost-widget"]'): #print widget.extract() item = ScriptItem() item['url'] = widget.xpath('.//a/@href').extract() url = item['url'] #print

Scrapy doesn't call callback function even with no filter

老子叫甜甜 提交于 2020-01-25 09:13:16
问题 I have this code to crawl the details page yield Request(flexibleItem[self.linkAttributeName],callback=self.parseDetails,dont_filter=True ) there is no error in the subURL because I tested it with the same method "GET" I didn't get any error but simply python ignoring the callback function It is a very huge project working on a server so I can't share the code . But here is the main architecture for what I am doing . Output is : in start request TRUE oooo def start_requests(self): print "in

How to add a new service to scrapyd from current project

邮差的信 提交于 2020-01-23 03:03:32
问题 I am trying to run multiple spiders at once and I made my own custom command in scrapy. Now I am trying to run that command through srapyd. I tried to add it as a new service to my scrapd.conf but it throws an error saying there is no such module. Failed to load application: No module named XXXX Also, I cannot set a relative path. My question is how can I add my custom command as a service or fire it through scrapyd. I have something like this in my scrapyd.conf: updateoutdated.json =

Submit form that renders dynamically with Scrapy?

血红的双手。 提交于 2020-01-13 06:43:08
问题 I'm trying to submit a dynamically generated user login form using Scrapy and then parse the HTML on the page that corresponds to a successful login. I was wondering how I could do that with Scrapy or a combination of Scrapy and Selenium. Selenium makes it possible to find the element on the DOM, but I was wondering if it would be possible to "give control back" to Scrapy after getting the full HTML in order to allow it to carry out the form submission and save the necessary cookies, session

SgmlLinkExtractor 'allow' definition not working with Scrapy

懵懂的女人 提交于 2020-01-07 05:09:18
问题 I am using Python.org version 2.7 64 bit on Windows Vista 64 bit. I have the following Scrapy code where the way I have defined SgmlLinkExtractor is not crawling the site correctly: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import Selector from scrapy.item import Item from scrapy.spider import BaseSpider from scrapy import log from scrapy.cmdline import execute from scrapy.utils.markup import

SgmlLinkExtractor 'allow' definition not working with Scrapy

余生长醉 提交于 2020-01-07 05:08:38
问题 I am using Python.org version 2.7 64 bit on Windows Vista 64 bit. I have the following Scrapy code where the way I have defined SgmlLinkExtractor is not crawling the site correctly: from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import Selector from scrapy.item import Item from scrapy.spider import BaseSpider from scrapy import log from scrapy.cmdline import execute from scrapy.utils.markup import

export python data to csv file

烂漫一生 提交于 2020-01-06 23:49:54
问题 I'm trying to export my file via command line : scrapy crawl tunisaianet -o save.csv -t csv but nothing is happenning, any help? here is my code: import scrapy import csv from tfaw.items import TfawItem class TunisianetSpider(scrapy.Spider): name = "tunisianet" allowed_domains = ["tunisianet.com.tn"] start_urls = [ 'http://www.tunisianet.com.tn/466-consoles-jeux/', ] def parse(self, response): item = TfawItem() data= [] out = open('out.csv', 'a') x = response.xpath('//*[contains(@class, "ajax