How can I serialize a numpy array while preserving matrix dimensions?

后端 未结 7 842
青春惊慌失措
青春惊慌失措 2020-12-04 13:14

numpy.array.tostring doesn\'t seem to preserve information about matrix dimensions (see this question), requiring the user to issue a call to numpy.array.

7条回答
  •  感情败类
    2020-12-04 13:51

    I found the code in Msgpack-numpy helpful. https://github.com/lebedov/msgpack-numpy/blob/master/msgpack_numpy.py

    I modified the serialised dict slightly and added base64 encoding to reduce the serialised size.

    By using the same interface as json (providing load(s),dump(s)), you can provide a drop-in replacement for json serialisation.

    This same logic can be extended to add any automatic non-trivial serialisation, such as datetime objects.


    EDIT I've written a generic, modular, parser that does this and more. https://github.com/someones/jaweson


    My code is as follows:

    np_json.py

    from json import *
    import json
    import numpy as np
    import base64
    
    def to_json(obj):
        if isinstance(obj, (np.ndarray, np.generic)):
            if isinstance(obj, np.ndarray):
                return {
                    '__ndarray__': base64.b64encode(obj.tostring()),
                    'dtype': obj.dtype.str,
                    'shape': obj.shape,
                }
            elif isinstance(obj, (np.bool_, np.number)):
                return {
                    '__npgeneric__': base64.b64encode(obj.tostring()),
                    'dtype': obj.dtype.str,
                }
        if isinstance(obj, set):
            return {'__set__': list(obj)}
        if isinstance(obj, tuple):
            return {'__tuple__': list(obj)}
        if isinstance(obj, complex):
            return {'__complex__': obj.__repr__()}
    
        # Let the base class default method raise the TypeError
        raise TypeError('Unable to serialise object of type {}'.format(type(obj)))
    
    
    def from_json(obj):
        # check for numpy
        if isinstance(obj, dict):
            if '__ndarray__' in obj:
                return np.fromstring(
                    base64.b64decode(obj['__ndarray__']),
                    dtype=np.dtype(obj['dtype'])
                ).reshape(obj['shape'])
            if '__npgeneric__' in obj:
                return np.fromstring(
                    base64.b64decode(obj['__npgeneric__']),
                    dtype=np.dtype(obj['dtype'])
                )[0]
            if '__set__' in obj:
                return set(obj['__set__'])
            if '__tuple__' in obj:
                return tuple(obj['__tuple__'])
            if '__complex__' in obj:
                return complex(obj['__complex__'])
    
        return obj
    
    # over-write the load(s)/dump(s) functions
    def load(*args, **kwargs):
        kwargs['object_hook'] = from_json
        return json.load(*args, **kwargs)
    
    
    def loads(*args, **kwargs):
        kwargs['object_hook'] = from_json
        return json.loads(*args, **kwargs)
    
    
    def dump(*args, **kwargs):
        kwargs['default'] = to_json
        return json.dump(*args, **kwargs)
    
    
    def dumps(*args, **kwargs):
        kwargs['default'] = to_json
        return json.dumps(*args, **kwargs)
    

    You should be able to then do the following:

    import numpy as np
    import np_json as json
    np_data = np.zeros((10,10), dtype=np.float32)
    new_data = json.loads(json.dumps(np_data))
    assert (np_data == new_data).all()
    

提交回复
热议问题