Save progress between multiple instances of partial_fit in Python SGDClassifier

强颜欢笑 提交于 2020-01-07 02:54:36

问题


I've successfully followed this example for my own text classification script.

The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add data as it becomes available, even if I shut down my python script in the meantime.

Ideally I'd like to do something like this:

sometime in 2015:

model2015=partial_fit(dataset2015)

save_to_file(model2015)

shut down my python script

sometime in 2016:

open my python script again

load_from_file(model2015)

partial_fit(dataset2016 incorporating model2015)

save_to_file(model2016)

sometime in 2017:

open my python script again

etc...

Is there any way I can do this in scikit-learn? Or in some other package (Tensorflow perhaps)?


回答1:


Simply pickle your model and save it to disk. The other way is to dump .coef_ and .intercept_ fields (which is just two arrays) and use them as initializers when you call .fit



来源:https://stackoverflow.com/questions/35662635/save-progress-between-multiple-instances-of-partial-fit-in-python-sgdclassifier

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!