json file get's damaged while putting it into a zip archive with python

泄露秘密 提交于 2020-03-04 11:04:31

问题


After crawling a site with scrapy, I am creating a zip archive within the closing method, pulling pictures into it. Then I add a valid json file to the archive.

After unzipping (on mac os x or ubuntu) the json file will show up damaged. The last item is missing.

End of decompressed file:

..a46.jpg"]},

Original file:

a46.jpg"]}]

Code:

# create zip archive with all images inside
filename = '../zip/' + datetime.datetime.now().strftime ("%Y%m%d-%H%M") + '_' + name
imagefolder = 'full'
imagepath = '/Users/user/test_crawl/bid/images'
shutil.make_archive(
    filename, 
    'zip', 
    imagepath,
    imagefolder
) 

# add json file to zip archive
filename_zip = filename + '.zip'
zip = zipfile.ZipFile(filename_zip,'a') 
path_to_file = '/Users/user/test_crawl/bid/data/'+  
datetime.datetime.now().strftime ("%Y%m%d") + '_' + name + '.json'
zip.write(path_to_file, os.path.basename(path_to_file)) 
zip.close()

I could reproduce this error several times and everything else looks OK.


回答1:


The solution is to use scrapy jsonitemexporter instead of fead exporter as the feed exporter will write to the file during close_spider(), which is to late.

This is done pretty easy.

load JsonItemExporter inside file pipelines.py

from scrapy.exporters import JsonItemExporter

Change your pipeline like this:

class MyPipeline(object):

    file = None

    def open_spider(self, spider):
        self.file = open('data/test.json', 'wb')
        self.exporter = JsonItemExporter(self.file)
        self.exporter.start_exporting()

    def close_spider(self, spider):
        self.exporter.finish_exporting()
        self.file.close()
        cleanup('zip_method')

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

The zip_method contains the zip code mentioned in the question.



来源:https://stackoverflow.com/questions/53602641/json-file-gets-damaged-while-putting-it-into-a-zip-archive-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!