Importing wrongly concatenated JSONs in python

后端未结

关注

 3  1052

予麋鹿 2021-01-07 07:33

I\'ve a text document that has several thousand jsons strings in the form of: \"{...}{...}{...}\". This is not a valid json it self but each {...}

3条回答

暖寄归人 (楼主)

2021-01-07 07:51
Use the raw_decode method of json.JSONDecoder
```
>>> import json
>>> d = json.JSONDecoder()
>>> x='{"name":\"Bob Dylan\", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}'
>>> d.raw_decode(x)
({'tags': '{Artist}{Singer}', 'name': 'Bob Dylan'}, 47)
>>> x=x[47:]
>>> d.raw_decode(x)
({'name': 'Michael Jackson'}, 27)
```
raw_decode returns a 2-tuple, the first element being the decoded JSON and the second being the offset in the string of the next byte after the JSON ended.

To loop until the end or until an invalid JSON element is encountered:
```
>>> while True:
...   try:
...     j,n = d.raw_decode(x)
...   except ValueError:
...     break
...   print(j)
...   x=x[n:]
... 
{'name': 'Bob Dylan', 'tags': '{Artist}{Singer}'}
{'name': 'Michael Jackson'}
```
When the loop breaks, inspection of x will reveal if it has processed the whole string or had encountered a JSON syntax error.

With a very long file of short elements you might read a chunk into a buffer and apply the above loop, concatenating anything that's left over with the next chunk after the loop breaks.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...