Whether we can update existing model in spark-ml/spark-mllib?
问题 We are using spark-ml to build the model from existing data. New data comes on daily basis. Is there a way that we can only read the new data and update the existing model without having to read all the data and retrain every time? 回答1: It depends on the model you're using but for some Spark does exactly what you want. You can look at StreamingKMeans, StreamingLinearRegressionWithSGD, StreamingLogisticRegressionWithSGD and more broadly StreamingLinearAlgorithm. 回答2: To complete Florent's