scrapy-spider

scrapy from script output in json

梦想的初衷 提交于 2019-11-27 14:33:12
问题 I am running scrapy in a python script def setup_crawler(domain): dispatcher.connect(stop_reactor, signal=signals.spider_closed) spider = ArgosSpider(domain=domain) settings = get_project_settings() crawler = Crawler(settings) crawler.configure() crawler.crawl(spider) crawler.start() reactor.run() it runs successfully and stops but where is the result ? I want the result in json format, how can I do that? result = responseInJSON like we do using command scrapy crawl argos -o result.json -t

Running Multiple spiders in scrapy for 1 website in parallel?

折月煮酒 提交于 2019-11-27 14:11:43
问题 I want to crawl a website with 2 parts and my script is not as fast as I need. Is it possible to launch 2 spiders, one for scraping the first part and the second one for the second part? I tried to have 2 different classes, and run them scrapy crawl firstSpider scrapy crawl secondSpider but i think that it is not smart. I read the documentation of scrapyd but I don't know if it's good for my case. 回答1: I think what you are looking for is something like this: import scrapy from scrapy.crawler

Passing arguments to process.crawl in Scrapy python

我的梦境 提交于 2019-11-27 13:05:09
问题 I would like to get the same result as this command line : scrapy crawl linkedin_anonymous -a first=James -a last=Bond -o output.json My script is as follows : import scrapy from linkedin_anonymous_spider import LinkedInAnonymousSpider from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings spider = LinkedInAnonymousSpider(None, "James", "Bond") process = CrawlerProcess(get_project_settings()) process.crawl(spider) ## <-------------- (1) process.start()

Multiprocessing of Scrapy Spiders in Parallel Processes

◇◆丶佛笑我妖孽 提交于 2019-11-27 02:23:17
问题 There as several similar questions that I have already read on Stack Overflow. Unfortunately, I lost links of all of them, because my browsing history got deleted unexpectedly. All of the above questions, couldn't help me. Either, some of them have used CELERY or some of them SCRAPYD, and I want to use the MULTIPROCESSISNG Library. Also, the Scrapy Official Documentation shows how to run multiple spiders on a SINGLE PROCESS, not on MULTIPLE PROCESSES. None of them couldn't help me, and hence