Parsing nested JSON and writing it to CSV

前端 未结 1 805
温柔的废话
温柔的废话 2021-01-24 22:01

I\'m struggling with this problem. I have a JSON file and needs ti put it out to CSV, its fine if the structure is kind of flat with no deep nested items.

But in this ca

相关标签:
1条回答
  • 2021-01-24 22:32

    I'd collect keys only for the first object, then assume that the rest of the format is consistent.

    The following code also limits the nested object to just one; you did not specify what should happen when there is more than one. Having two or more nested structures of equal length could work (you'd 'zip' those together), but if you have structures of differing length you need to make an explicit choice how to handle those; zip with empty columns to pad, or to write out the product of those entries (A x B rows, repeating information from A each time you find a B entry).

    import csv
    from operator import itemgetter
    
    
    with open(outputfile, 'wb') as outf:
        writer = None  # will be set to a csv.DictWriter later
    
        for key, item in sorted(data.items(), key=itemgetter(0)):
            row = {}
            nested_name, nested_items = '', {}
            for k, v in item.items():
                if not isinstance(v, dict):
                    row[k] = v
                else:
                    assert not nested_items, 'Only one nested structure is supported'
                    nested_name, nested_items = k, v
    
            if writer is None:
                # build fields for each first key of each nested item first
                fields = sorted(row)
    
                # sorted keys of first item in key sorted order
                nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1])
                fields.extend('__'.join((nested_name, k)) for k in nested_keys)
    
                writer = csv.DictWriter(outf, fields)
                writer.writeheader()
    
            for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)):
                row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items())
                writer.writerow(row)
    

    For your sample input, this produces:

    COUNTRY,ITW,VENUE,RACES__NO,RACES__TIME
    HAE,XAD,JOEBURG,1,12:35
    HAE,XAD,JOEBURG,2,13:10
    HAE,XAD,JOEBURG,3,13:40
    HAE,XAD,JOEBURG,4,14:10
    HAE,XAD,JOEBURG,5,14:55
    HAE,XAD,JOEBURG,6,15:30
    HAE,XAD,JOEBURG,7,16:05
    HAE,XAD,JOEBURG,8,16:40
    ABA,XAD,FOOBURG,1,12:35
    ABA,XAD,FOOBURG,2,13:10
    ABA,XAD,FOOBURG,3,13:40
    ABA,XAD,FOOBURG,4,14:10
    ABA,XAD,FOOBURG,5,14:55
    ABA,XAD,FOOBURG,6,15:30
    ABA,XAD,FOOBURG,7,16:05
    ABA,XAD,FOOBURG,8,16:40
    
    0 讨论(0)
提交回复
热议问题