How to save models from ML Pipeline to S3 or HDFS?

流过昼夜 提交于 2019-11-27 09:04:01

One way to save a model to HDFS is as following:

// persist model to HDFS
sc.parallelize(Seq(model), 1).saveAsObjectFile("hdfs:///user/root/linReg.model")

Saved model can then be loaded as:

val linRegModel = sc.objectFile[LinearRegressionModel]("linReg.model").first()

For more details see (ref)

Since Apache-Spark 1.6 and in the Scala API, you can save your models without using any tricks. Because, all models from the ML library come with a save method, you can check this in the LogisticRegressionModel, indeed it has that method. By the way to load the model you can use a static method.

val logRegModel = LogisticRegressionModel.load("myModel.model")

So FileOutputStream saves to local filesystem (not through the hadoop libraries), so saving to a locally directory is the way to go about doing this. That being said, the directory needs to exist, so make sure the directory exists first.

That being said, depending on your model you may wish to look at https://spark.apache.org/docs/latest/mllib-pmml-model-export.html (pmml export).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!