Why pickle.dump(obj) has different size with sys.getsizeof(obj)? How to save variable to file file?

▼魔方 西西 提交于 2019-12-12 03:06:30

问题


I use classifier of random forest from scikit lib of python to do my exercise. The result changes each running time. So I run 1000 times and get the average result.

I save object rf into files to predict later by pickle.dump() and get about 4MB each file. However, sys.getsizeof(rf) give me just 36 bytes

rf = RandomForestClassifier(n_estimators = 50)
rf.fit(matX, vecY)
pickle.dump(rf,'var.sav')

My questions:

  • sys.getsizeof() seems to be wrong in getting size of RandomForestClassifier object, doesn't it? why?
  • How to save object in zip file so that it has smaller size?

回答1:


getsizeof() gives you the memory footprint of just the object, and not of any other values referenced by that object. You'd need to recurse over the object to find the total size of all attributes too, and anything those attributes hold, etc.

Pickling is a serialization format. Serialization needs to store metadata as well as the contents of the object. Memory size and pickle size only have a rough correlation.

Pickles are byte streams, if you need to have a more compact bytestream, use compression.

If you are storing your pickles in a ZIP file, your data will already be compressed; compressing the pickle before storing it in the ZIP will not help in that case as already compressed data runs the risk to become bigger after additional ZIP compression instead due to metadata overhead and lack of duplicate data in typical compressed data.



来源:https://stackoverflow.com/questions/19516403/why-pickle-dumpobj-has-different-size-with-sys-getsizeofobj-how-to-save-var

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!