How to get string objects instead of Unicode from JSON?

前端 未结 21 1171
伪装坚强ぢ
伪装坚强ぢ 2020-11-22 14:43

I\'m using Python 2 to parse JSON from ASCII encoded text files.

When loading these files with either json or simplejson, all my

21条回答
  •  半阙折子戏
    2020-11-22 15:25

    Mike Brennan's answer is close, but there is no reason to re-traverse the entire structure. If you use the object_hook_pairs (Python 2.7+) parameter:

    object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If object_hook is also defined, the object_pairs_hook takes priority.

    With it, you get each JSON object handed to you, so you can do the decoding with no need for recursion:

    def deunicodify_hook(pairs):
        new_pairs = []
        for key, value in pairs:
            if isinstance(value, unicode):
                value = value.encode('utf-8')
            if isinstance(key, unicode):
                key = key.encode('utf-8')
            new_pairs.append((key, value))
        return dict(new_pairs)
    
    In [52]: open('test.json').read()
    Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'                                        
    
    In [53]: json.load(open('test.json'))
    Out[53]: 
    {u'1': u'hello',
     u'abc': [1, 2, 3],
     u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
     u'def': {u'hi': u'mom'}}
    
    In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
    Out[54]: 
    {'1': 'hello',
     'abc': [1, 2, 3],
     'boo': [1, 'hi', 'moo', {'5': 'some'}],
     'def': {'hi': 'mom'}}
    

    Notice that I never have to call the hook recursively since every object will get handed to the hook when you use the object_pairs_hook. You do have to care about lists, but as you can see, an object within a list will be properly converted, and you don't have to recurse to make it happen.

    EDIT: A coworker pointed out that Python2.6 doesn't have object_hook_pairs. You can still use this will Python2.6 by making a very small change. In the hook above, change:

    for key, value in pairs:
    

    to

    for key, value in pairs.iteritems():
    

    Then use object_hook instead of object_pairs_hook:

    In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
    Out[66]: 
    {'1': 'hello',
     'abc': [1, 2, 3],
     'boo': [1, 'hi', 'moo', {'5': 'some'}],
     'def': {'hi': 'mom'}}
    

    Using object_pairs_hook results in one less dictionary being instantiated for each object in the JSON object, which, if you were parsing a huge document, might be worth while.

提交回复
热议问题