Retrieving JSON objects from a text file (using Python)

后端 未结 9 1604
失恋的感觉
失恋的感觉 2020-11-30 04:50

I have thousands of text files containing multiple JSON objects, but unfortunately there is no delimiter between the objects. Objects are stored as dictionaries and some of

9条回答
  •  悲&欢浪女
    2020-11-30 05:47

    This decodes your "list" of JSON Objects from a string:

    from json import JSONDecoder
    
    def loads_invalid_obj_list(s):
        decoder = JSONDecoder()
        s_len = len(s)
    
        objs = []
        end = 0
        while end != s_len:
            obj, end = decoder.raw_decode(s, idx=end)
            objs.append(obj)
    
        return objs
    

    The bonus here is that you play nice with the parser. Hence it keeps telling you exactly where it found an error.

    Examples

    >>> loads_invalid_obj_list('{}{}')
    [{}, {}]
    
    >>> loads_invalid_obj_list('{}{\n}{')
    Traceback (most recent call last):
      File "", line 1, in 
      File "decode.py", line 9, in loads_invalid_obj_list
        obj, end = decoder.raw_decode(s, idx=end)
      File     "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
        obj, end = self.scan_once(s, idx)
    ValueError: Expecting object: line 2 column 2 (char 5)
    

    Clean Solution (added later)

    import json
    import re
    
    #shameless copy paste from json/decoder.py
    FLAGS = re.VERBOSE | re.MULTILINE | re.DOTALL
    WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS)
    
    class ConcatJSONDecoder(json.JSONDecoder):
        def decode(self, s, _w=WHITESPACE.match):
            s_len = len(s)
    
            objs = []
            end = 0
            while end != s_len:
                obj, end = self.raw_decode(s, idx=_w(s, end).end())
                end = _w(s, end).end()
                objs.append(obj)
            return objs
    

    Examples

    >>> print json.loads('{}', cls=ConcatJSONDecoder)
    [{}]
    
    >>> print json.load(open('file'), cls=ConcatJSONDecoder)
    [{}]
    
    >>> print json.loads('{}{} {', cls=ConcatJSONDecoder)
    Traceback (most recent call last):
      File "", line 1, in 
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
        return cls(encoding=encoding, **kw).decode(s)
      File "decode.py", line 15, in decode
        obj, end = self.raw_decode(s, idx=_w(s, end).end())
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 376, in raw_decode
        obj, end = self.scan_once(s, idx)
    ValueError: Expecting object: line 1 column 5 (char 5)
    

提交回复
热议问题