Order a json by field using scrapy

牧云@^-^@ 提交于 2019-12-02 12:41:27

If I needed my output file to be sorted (I will assume you have a valid reason to want this), I'd probably write a custom exporter.

This is how Scrapy's built-in JsonItemExporter is implemented.
With a few simple changes, you can modify it to add the items to a list in export_item(), and then sort the items and write out the file in finish_exporting().

Since you're only scraping a few hundred items, the downsides of storing a list of them and not writing to a file until the crawl is done shouldn't be a problem to you.

By now I've found a working solution using pipeline:

import json

class JsonWriterPipeline(object):

    def open_spider(self, spider):
        self.list_items = []
        self.file = open('euler.json', 'w')

    def close_spider(self, spider):
        ordered_list = [None for i in range(len(self.list_items))]

        self.file.write("[\n")

        for i in self.list_items:
            ordered_list[int(i['id']-1)] = json.dumps(dict(i))

        for i in ordered_list:
            self.file.write(str(i)+",\n")

        self.file.write("]\n")
        self.file.close()

    def process_item(self, item, spider):
        self.list_items.append(item)
        return item

Though it may be non optimal, because the guide suggests in another example:

The purpose of JsonWriterPipeline is just to introduce how to write item pipelines. If you really want to store all scraped items into a JSON file you should use the Feed exports.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!