Save progress between multiple instances of partial_fit in Python SGDClassifier

问题

I've successfully followed this example for my own text classification script.

The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add data as it becomes available, even if I shut down my python script in the meantime.

Ideally I'd like to do something like this:

sometime in 2015:

model2015=partial_fit(dataset2015)

save_to_file(model2015)

shut down my python script

sometime in 2016:

open my python script again

load_from_file(model2015)

partial_fit(dataset2016 incorporating model2015)

save_to_file(model2016)

sometime in 2017:

open my python script again

etc...

Is there any way I can do this in scikit-learn? Or in some other package (Tensorflow perhaps)?

回答1:

Simply pickle your model and save it to disk. The other way is to dump .coef_ and .intercept_ fields (which is just two arrays) and use them as initializers when you call .fit

来源：https://stackoverflow.com/questions/35662635/save-progress-between-multiple-instances-of-partial-fit-in-python-sgdclassifier

标签

python-3.x

machine-learning

scikit-learn

text-classification

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!