问题
I have a large dictionary whose structure looks like:
dcPaths = {'id_jola_001': CPath instance}
where CPath is a self-defined class:
class CPath(object):
def __init__(self):
# some attributes
self.m_dAvgSpeed = 0.0
...
# a list of CNode instance
self.m_lsNodes = []
where m_lsNodes is a list of CNode:
class CNode(object):
def __init__(self):
# some attributes
self.m_nLoc = 0
# a list of Apps
self.m_lsApps = []
Here, m_lsApps is a list of CApp, which is another self-defined class:
class CApp(object):
def __init__(self):
# some attributes
self.m_nCount= 0
self.m_nUpPackets = 0
I serialize this dictionary by using cPickle:
def serialize2File(strFileName, strOutDir, obj):
if len(obj) != 0:
strOutFilePath = "%s%s" % (strOutDir, strFileName)
with open(strOutFilePath, 'w') as hOutFile:
cPickle.dump(obj, hOutFile, protocol=0)
return strOutFilePath
else:
print("Nothing to serialize!")
It works fine and the size of serialized file is about 6.8GB. However, when I try to deserialize this object:
def deserializeFromFile(strFilePath):
obj = 0
with open(strFilePath) as hFile:
obj = cPickle.load(hFile)
return obj
I find it consumes more than 90GB memory and takes a long time.
- why would this happen?
- Is there any way I could optimize this?
BTW, I'm using python 2.7.6
回答1:
You can try specifying the pickle protocol; fastest is -1 (meaning: latest protocol, no problem if you are pickling and unpickling with the same Python version).
cPickle.dump(obj, file, protocol = -1)
EDIT:
As said in the comments: load detects the protocol itself.
cPickle.load(obj, file)
回答2:
When you store complex python objects, python usually stores a lot of useless data (look at the __dict__ object property).
In order to reduce the memory consumption of unserialized data you should pickle only python natives. You can achieve this easily implementing some methods on your classes: object.__getstate__() and object.__setstate__(state).
See Pickling and unpickling normal class instances on python documentation.
来源:https://stackoverflow.com/questions/23261598/cpickle-load-in-python-consumes-a-large-memory