Scrapy overwrite json files instead of appending the file

后端 未结 6 1172
长情又很酷
长情又很酷 2020-12-28 10:57

Is there a way to overwrite the said file instead of appending it?

Example)

scrapy crawl myspider -o \"/path/to/json/my.json\" -t json    
scrapy cra         


        
6条回答
  •  时光取名叫无心
    2020-12-28 11:19

    To overcome this problem I created a subclass from scrapy.extensions.feedexport.FileFeedStorage in myproject dir.

    This is my customexport.py:

    """Custom Feed Exports extension."""
    import os
    
    from scrapy.extensions.feedexport import FileFeedStorage
    
    
    class CustomFileFeedStorage(FileFeedStorage):
        """
        A File Feed Storage extension that overwrites existing files.
    
        See: https://github.com/scrapy/scrapy/blob/master/scrapy/extensions/feedexport.py#L79
        """
    
        def open(self, spider):
            """Return the opened file."""
            dirname = os.path.dirname(self.path)
            if dirname and not os.path.exists(dirname):
                os.makedirs(dirname)
            # changed from 'ab' to 'wb' to truncate file when it exists
            return open(self.path, 'wb')
    

    Then I added the following to my settings.py (see: https://doc.scrapy.org/en/1.2/topics/feed-exports.html#feed-storages-base):

    FEED_STORAGES_BASE = {
        '': 'myproject.customexport.CustomFileFeedStorage',
        'file': 'myproject.customexport.CustomFileFeedStorage',
    }
    

    Now every time I write to a file it gets overwritten because of this.

提交回复
热议问题