问题
I've successfully followed this example for my own text classification script.
The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add data as it becomes available, even if I shut down my python script in the meantime.
Ideally I'd like to do something like this:
sometime in 2015:
model2015=partial_fit(dataset2015)
save_to_file(model2015)
shut down my python script
sometime in 2016:
open my python script again
load_from_file(model2015)
partial_fit(dataset2016 incorporating model2015)
save_to_file(model2016)
sometime in 2017:
open my python script again
etc...
Is there any way I can do this in scikit-learn? Or in some other package (Tensorflow perhaps)?
回答1:
Simply pickle your model and save it to disk. The other way is to dump .coef_ and .intercept_ fields (which is just two arrays) and use them as initializers when you call .fit
来源:https://stackoverflow.com/questions/35662635/save-progress-between-multiple-instances-of-partial-fit-in-python-sgdclassifier