How to save and load MLLib model in Apache Spark?

天涯浪子 提交于 2019-11-27 12:29:48

问题


I trained a classification model in Apache Spark (using pyspark). I stored the model in an object, LogisticRegressionModel. Now, I want to make predictions on new data. I would like to store the model, and read it back into a new program in order to make the predictions. Any idea how to store the model? I'm thinking of maybe pickle, but I'm a newbie to both python and Spark, so I'd like to hear what the community thinks.


回答1:


You can save your model by using the save method of mllib models.

# let lrm be a LogisticRegression Model
lrm.save(sc, "lrm_model.model")

After storing it you can load it in another application.

sameModel = LogisticRegressionModel.load(sc, "lrm_model.model")

As @zero323 stated before, there is another way to achieve this, and is by using the Predictive Model Markup Language (PMML).

is an XML-based file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms.



来源:https://stackoverflow.com/questions/34270427/how-to-save-and-load-mllib-model-in-apache-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!