Is there a way to overwrite the said file instead of appending it?
Example)
scrapy crawl myspider -o \"/path/to/json/my.json\" -t json
scrapy cra
To overcome this problem I created a subclass from scrapy.extensions.feedexport.FileFeedStorage
in myproject dir.
This is my customexport.py
:
"""Custom Feed Exports extension."""
import os
from scrapy.extensions.feedexport import FileFeedStorage
class CustomFileFeedStorage(FileFeedStorage):
"""
A File Feed Storage extension that overwrites existing files.
See: https://github.com/scrapy/scrapy/blob/master/scrapy/extensions/feedexport.py#L79
"""
def open(self, spider):
"""Return the opened file."""
dirname = os.path.dirname(self.path)
if dirname and not os.path.exists(dirname):
os.makedirs(dirname)
# changed from 'ab' to 'wb' to truncate file when it exists
return open(self.path, 'wb')
Then I added the following to my settings.py
(see: https://doc.scrapy.org/en/1.2/topics/feed-exports.html#feed-storages-base):
FEED_STORAGES_BASE = {
'': 'myproject.customexport.CustomFileFeedStorage',
'file': 'myproject.customexport.CustomFileFeedStorage',
}
Now every time I write to a file it gets overwritten because of this.