问题
ENVIRONMENT: Windows7, Python 3.6.5, Scrapy 1.5.1
PROBLEM DESCRIPTION:
I have a scrapy project called project_github
, which contains 3 spiders:spider1
, spider2
, spider3
. Each of these spiders scrapes data from a particular website individual to that spider.
I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json
, so that from the command line I can:
Execute the script scrapy crawl spider1
which returns spider1_181115.json
Currently I am using ITEM EXPORTERS
in settings.py
with the following code:
import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'
Obviously this code always writes spider1_TodaysDate.json
regardless of the spider used... Any suggestions?
回答1:
The way to do this is by defining custom_settings
as a class
attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.
So, for spider1
:
class spider1(scrapy.Spider):
name = "spider1"
allowed_domains = []
custom_settings = {
'FEED_URI': 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
'FEED_FORMAT': 'json',
'FEED_EXPORTERS': {
'json': 'scrapy.exporters.JsonItemExporter',
},
'FEED_EXPORT_ENCODING': 'utf-8',
}
来源:https://stackoverflow.com/questions/53318905/scrapy-use-feed-exporter-for-a-particular-spider-and-not-others-in-a-project