Trained Machine Learning model is too big

自古美人都是妖i 提交于 2020-07-04 13:28:08

问题


We have trained an Extra Tree model for some regression task. Our model consists of 3 extra trees, each having 200 trees of depth 30. On top of the 3 extra trees, we use a ridge regression. We train our model for several hours and pickle the trained model (the entire class object), for later use. However, the size of saved trained model is too big, about 140 GB! Is there a way to reduce the size of the saved model? are there any configuration in pickle that could be helpful, or any alternative for pickle?


回答1:


In the best case (binary trees), you will have 3 * 200 * (2^30 - 1) = 644245094400 nodes or 434Gb assuming each one node would only cost 1 byte to store. I think that 140GB is a pretty decent size in comparision.

Edit: Bad maths.




回答2:


You can try using joblib with compression parameter.

   from sklearn.externals import joblib
   joblib.dump(your_algo,  'pickle_file_name.pkl',compress=3)

compress - from 0 to 9. Higher value means more compression, but also slower read and write times. Using a value of 3 is often a good compromise.

You can use python standard compression modules zlib, gzip, bz2, lzma and xz. To use that you can just specify the format with specific extension

example

joblib.dump(obj, 'your_filename.pkl.z')   # zlib

More information, see the [link]:(http://gael-varoquaux.info/programming/new_low-overhead_persistence_in_joblib_for_big_data.html)



来源:https://stackoverflow.com/questions/43591621/trained-machine-learning-model-is-too-big

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!