Whether we can update existing model in spark-ml/spark-mllib?

 ̄綄美尐妖づ 提交于 2019-11-30 06:03:47

问题


We are using spark-ml to build the model from existing data. New data comes on daily basis.

Is there a way that we can only read the new data and update the existing model without having to read all the data and retrain every time?


回答1:


It depends on the model you're using but for some Spark does exactly what you want. You can look at StreamingKMeans, StreamingLinearRegressionWithSGD, StreamingLogisticRegressionWithSGD and more broadly StreamingLinearAlgorithm.




回答2:


To complete Florent's answer, if you are not in a streaming context, some Spark mllib models support an initialModel as a starting point for incremental updates. See KMeans, or GMM for instance.



来源:https://stackoverflow.com/questions/41192799/whether-we-can-update-existing-model-in-spark-ml-spark-mllib

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!