How to overwrite Spark ML model in PySpark?

问题

from pyspark.ml.regression import RandomForestRegressionModel

rf = RandomForestRegressor(labelCol="label",featuresCol="features", numTrees=5, maxDepth=10, seed=42)
rf_model = rf.fit(train_df)
rf_model_path = "./hdfsData/" + "rfr_model"
rf_model.save(rf_model_path)

When I first tried to save the model, these lines worked. But when I want to save the model into the path again, it gave this error:

Py4JJavaError: An error occurred while calling o1695.save. : java.io.IOException: Path ./hdfsData/rfr_model already exists. Please use write.overwrite().save(path) to overwrite it.

Then I tried:

rf_model.write.overwrite().save(rf_model_path)

It gave:

AttributeError: 'function' object has no attribute 'overwrite'

It seems the pyspark.mllib module gives the overwrite function but not pyspark.ml module. Anyone knows how to resolve this if I want to overwrite the old model with the new model? Thanks.

回答1:

The message you see is a Java error message, not a Python one. You should call the write method first:

rf_model.write().overwrite().save(rf_model_path)

来源：https://stackoverflow.com/questions/42303705/how-to-overwrite-spark-ml-model-in-pyspark

标签

apache-spark

machine-learning

pyspark

apache-spark-mllib

apache-spark-ml

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!