How to pass custom settings through CrawlerProcess in scrapy?

▼魔方 西西 提交于 2019-12-05 19:29:59

You cannot tell your file about these settings. You are perhaps confused between crawler settings and spider settings. In scrapy, the feed paramaters as of the time of this wrting need to be passed to the crawler process and not to the spider. You have to pass them as parameters to your crawler process. I have the same use case as you. What you do is read the current project settings and then override it for each crawler process. Please see the example code below:

s = get_project_settings()
s['FEED_FORMAT'] = 'csv'
s['LOG_LEVEL'] = 'INFO'
s['FEED_URI'] = 'Q1.csv'
s['LOG_FILE'] = 'Q1.log'

proc = CrawlerProcess(s)

And then your call to process.crawl() is not correct. The name of the spider should be passed as the first argument as a string, like this: process.crawl('MySpider', crawl_links=main_links) and of course MySpider should be the value given to the name attribute in your spider class.

Do not pass settings to crawl() method. And also pass class name of your spider as first argument to crawl().

from my_crawler.spiders.my_scraper import MySpider
from scrapy.crawler import CrawlerProcess
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from twisted.internet import reactor

process = CrawlerProcess(get_project_settings())

process.crawl(MySpider(), crawl_links=main_links)

process.start() 
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!