How to produce custom JSON output from Scrapy?

纵饮孤独 提交于 2019-12-11 14:03:14

问题


I am working on a Scrapy script which should make output like:

{
  "state": "FL",
  "date": "2017-11-03T14:52:26.007Z",
  "games": [
    {
      "name":"Game1"
    },
    {
      "name":"Game2"
    }
  ]
}

But for me it is making as below when I run scrapy crawl items -o data.json -t json. The repetition of state

[
{"state": "CA", "games": [], "crawlDate": "2014-10-04"},
{"state": "CA", "games": [], "crawlDate": "2014-10-04"},
]

The code is given below:

import scrapy

items.py

class Item(scrapy.Item):
 state = scrapy.Field()
 games = scrapy.Field()

In Spider file, item class is called as:

item = Item()
item['state'] = state
item['Date'] = '2014-10-04'
item['games'] = games

I know this is not complete code but it should give an idea what I am all about.


回答1:


Ref. https://stackoverflow.com/a/43698923/8964297

You could try to write your own pipeline like this:

Put this into your pipelines.py file:

import json


class JsonWriterPipeline(object):
    def open_spider(self, spider):
        self.file = open('scraped_items.json', 'w')
        # Your scraped items will be saved in the file 'scraped_items.json'.
        # You can change the filename to whatever you want.
        self.file.write("[")

    def close_spider(self, spider):
        self.file.write("]")
        self.file.close()

    def process_item(self, item, spider):
        line = json.dumps(
            dict(item),
            indent = 4,
            sort_keys = True,
            separators = (',', ': ')
        ) + ",\n"
        self.file.write(line)
        return item

Then modify your settings.py to include the following:

ITEM_PIPELINES = {
    'YourSpiderName.pipelines.JsonWriterPipeline': 300,
}

Change YourSpiderName to the correct name of your spider.

Note that the file gets written directly by the pipeline, so you don't have to specify file and format with the -o and -t command line parameters.

Hope this gets you closer to what you need.



来源:https://stackoverflow.com/questions/47377898/how-to-produce-custom-json-output-from-scrapy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!