Importing wrongly concatenated JSONs in python

后端 未结 3 1052
予麋鹿
予麋鹿 2021-01-07 07:33

I\'ve a text document that has several thousand jsons strings in the form of: \"{...}{...}{...}\". This is not a valid json it self but each {...}

3条回答
  •  暖寄归人
    2021-01-07 07:51

    Use the raw_decode method of json.JSONDecoder

    >>> import json
    >>> d = json.JSONDecoder()
    >>> x='{"name":\"Bob Dylan\", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}'
    >>> d.raw_decode(x)
    ({'tags': '{Artist}{Singer}', 'name': 'Bob Dylan'}, 47)
    >>> x=x[47:]
    >>> d.raw_decode(x)
    ({'name': 'Michael Jackson'}, 27)
    

    raw_decode returns a 2-tuple, the first element being the decoded JSON and the second being the offset in the string of the next byte after the JSON ended.

    To loop until the end or until an invalid JSON element is encountered:

    >>> while True:
    ...   try:
    ...     j,n = d.raw_decode(x)
    ...   except ValueError:
    ...     break
    ...   print(j)
    ...   x=x[n:]
    ... 
    {'name': 'Bob Dylan', 'tags': '{Artist}{Singer}'}
    {'name': 'Michael Jackson'}
    

    When the loop breaks, inspection of x will reveal if it has processed the whole string or had encountered a JSON syntax error.

    With a very long file of short elements you might read a chunk into a buffer and apply the above loop, concatenating anything that's left over with the next chunk after the loop breaks.

提交回复
热议问题