Scrapy - Use feed exporter for a particular spider (and not others) in a project

生来就可爱ヽ(ⅴ<●) 提交于 2020-05-29 12:00:10

问题


ENVIRONMENT: Windows7, Python 3.6.5, Scrapy 1.5.1

PROBLEM DESCRIPTION:

I have a scrapy project called project_github, which contains 3 spiders:spider1, spider2, spider3. Each of these spiders scrapes data from a particular website individual to that spider.

I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json, so that from the command line I can:

Execute the script scrapy crawl spider1 which returns spider1_181115.json

Currently I am using ITEM EXPORTERS in settings.py with the following code:

import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'

Obviously this code always writes spider1_TodaysDate.json regardless of the spider used... Any suggestions?


回答1:


The way to do this is by defining custom_settings as a class attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.

So, for spider1:

class spider1(scrapy.Spider):
    name = "spider1"
    allowed_domains = []

    custom_settings = {
        'FEED_URI': 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
        'FEED_FORMAT': 'json',
        'FEED_EXPORTERS': {
            'json': 'scrapy.exporters.JsonItemExporter',
        },
        'FEED_EXPORT_ENCODING': 'utf-8',
    }


来源:https://stackoverflow.com/questions/53318905/scrapy-use-feed-exporter-for-a-particular-spider-and-not-others-in-a-project

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!