Spark - MLlib linear regression intercept and weight NaN [duplicate]

a 夏天 提交于 2019-12-08 07:24:26

问题


I have trying to build a regression model on Spark using some custom data and the intercept and weights are always nan. This is my data:

data = [LabeledPoint(0.0, [27022.0]), LabeledPoint(1.0, [27077.0]), LabeledPoint(2.0, [27327.0]), LabeledPoint(3.0, [27127.0])]

Output:

(weights=[nan], intercept=nan)  

However, if I use this dataset (taken from Spark examples), it returns a non nan weight and intercept.

data = [LabeledPoint(0.0, [0.0]), LabeledPoint(1.0, [1.0]), LabeledPoint(3.0, [2.0]),LabeledPoint(2.0, [3.0])]

Output:

(weights=[0.798729902914], intercept=0.3027117101297481) 

This my current code

model = LinearRegressionWithSGD.train(sc.parallelize(data), intercept=True)

Am I missing something? Is it because the numbers on my data are that big? It is my first time using MLlib so I might be missing some details.

Thanks


回答1:


MLlib linear regression is SGD based therefore you need to tweak iterations and step size, see https://spark.apache.org/docs/latest/mllib-optimization.html.

I tried your custom data like this and I got some results (in scala):

val numIterations = 20
val model = LinearRegressionWithSGD.train(sc.parallelize(data), numIterations)


来源:https://stackoverflow.com/questions/29751110/spark-mllib-linear-regression-intercept-and-weight-nan

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!