I need to serialise scikit-learn/statsmodels models such that all the dependencies (code + data) are packaged in an artefact and this artefact can be used to initialise the mode
I package gaussian process (GP) from scikit-learn using pickle.
The primary reason is because the GP takes long time to build and loads much faster using pickle. So in my code initialization I check whether the data files for model got updated and re-generate the model if necessary, otherwise just de-serialize it from pickle!
I would use pickle, dill, cloudpickle in the respective order.
Note that pickle includes protocol keyword argument and some values can speed up and reduce memory usage significantly!
Finally I wrap pickle code with compression from CPython STL if necessary.