How to do prediction with Sklearn Model inside Spark?

牧云@^-^@ 提交于 2019-12-23 08:59:31

问题


I have trained a model in python using sklearn. How we can use same model to load in Spark and generate predictions on a spark RDD ?


回答1:


Well,

I will show an example of linear regression in Sklearn and show you how to use that to predict elements in Spark RDD.

First training the model with sklearn example:

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

Here we just have the fit, and you need to predict each data from an RDD.

Your RDD in this case should be a RDD with X like this:

rdd = sc.parallelize([1, 2, 3, 4])

So you first need to broadcast your model of sklearn:

regr_bc = self.sc.broadcast(regr)

Then you can use it to predict your data like this:

rdd.map(lambda x: (x, regr_bc.value.predict(x))).collect()

So your element in the RDD is your X and the seccond element is going to be your predicted Y. The collect will return somthing like this:

[(1, 2), (2, 4), (3, 6), ...]


来源:https://stackoverflow.com/questions/42887621/how-to-do-prediction-with-sklearn-model-inside-spark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!