问题
The use case: Python class stores large numpy arrays (large, but small enough that working with them in-memory is a breeze) in a useful structure. Here's a cartoon of the situation:
main class: Environment; stores useful information pertinent to all balls
"child" class: Ball; stores information pertinent to this particular ball
Environment member variable: balls_in_environment (list of Balls)
Ball member variable: large_numpy_array (NxN numpy array that is large, but still easy to work with in-memory)
I would like to preferably persist Environment as whole.
Some options:
pickle: too slow, and it produces output that takes up a LOT of space on the hard drivedatabase: too much work; I could store the important information in the class in a database (requires me to write functions to take info from the class, and put it into the DB) and later rebuild the class by creating a new instance, and refilling it with data from the DB (requires me to write functions to do the rebuilding)
JSON: I am not very familiar with JSON, but Python has a standard library to deal with it, and it is the recommended solution of this article -- I don't see how JSON would be more compact than
picklethough; more importantly, doesn't deal nicely withnumpyMessagePack: another recommended package by the same article mentioned above; however, I have never heard of it, and don't want to strike out into the unknown with what seems to be a standard problem
numpy.save+ something else: store the numpy arrays associated with eachBall, usingnumpy.savefunctionality, and store the non-numpy stuff separately somehow (tedious)?
What is the best option for my use case?
回答1:
As I mentioned in the comments, joblib.dump might be a good option. It uses np.save to efficiently store numpy arrays, and cPickle for everything else:
import numpy as np
import cPickle
import joblib
import os
class SerializationTest(object):
def __init__(self):
self.array = np.random.randn(1000, 1000)
st = SerializationTest()
fnames = ['cpickle.pkl', 'numpy_save.npy', 'joblib.pkl']
# using cPickle
with open(fnames[0], 'w') as f:
cPickle.dump(st, f)
# using np.save
np.save(fnames[1], st)
# using joblib.dump (without compression)
joblib.dump(st, fnames[2])
# check file sizes
for fname in fnames:
print('%15s: %8.2f KB' % (fname, os.stat(fname).st_size / 1E3))
# cpickle.pkl: 23695.56 KB
# numpy_save.npy: 8000.33 KB
# joblib.pkl: 0.18 KB
One potential downside is that because joblib.dump uses cPickle to serialize Python objects, the resulting files are not portable from Python 2 to 3. For better portability you could look into using HDF5, e.g. here.
来源:https://stackoverflow.com/questions/33742406/how-to-persist-python-class-with-member-variables-that-are-also-python-classes-h