My goal is to convert JSON file into a format that can uploaded from Cloud Storage into BigQuery (as described here) with Python.
I have tried using newlineJSON pack
This takes a JSON file and converts into ND-JSON file.
import json
with open("results-20190312-113458.json", "r") as read_file:
data = json.load(read_file)
result = [json.dumps(record) for record in data]
with open('nd-proceesed.json', 'w') as obj:
for i in result:
obj.write(i+'\n')
Hope this helps someone.
The answer with jq
is really useful, but if you still want to do it with Python (as it seems from the question), you can do it with built-in json
module.
import json
from io import StringIO
in_json = StringIO("""[{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
}
]""")
result = [json.dumps(record) for record in json.load(in_json)] # the only significant line to convert the JSON to the desired format
print('\n'.join(result))
{"key01": "value01", "key02": "value02", "keyN": "valueN"}
{"key01": "value01", "key02": "value02", "keyN": "valueN"}
{"key01": "value01", "key02": "value02", "keyN": "valueN"}
* I'm using StringIO
and print
here just to make a sample easier to test locally.
As an alternative, you can use Python jq binding to combine it with the other answer.
If you are willing to get out of Python, use jq
:
$ cat a.json
[{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
}
]
$ cat a.json | jq -c '.[]'
{"key01":"value01","key02":"value02","keyN":"valueN"}
{"key01":"value01","key02":"value02","keyN":"valueN"}
{"key01":"value01","key02":"value02","keyN":"valueN"}
The iterator I used is '.[]'
to go through the array, and -c
puts each JSON object on a single line.
Resources: