I have trying to build a regression model on Spark using some custom data and the intercept and weights are always nan
.
This is my data:
data = [LabeledPoint(0.0, [27022.0]), LabeledPoint(1.0, [27077.0]), LabeledPoint(2.0, [27327.0]), LabeledPoint(3.0, [27127.0])]
Output:
(weights=[nan], intercept=nan)
However, if I use this dataset (taken from Spark examples), it returns a non nan
weight and intercept.
data = [LabeledPoint(0.0, [0.0]), LabeledPoint(1.0, [1.0]), LabeledPoint(3.0, [2.0]),LabeledPoint(2.0, [3.0])]
Output:
(weights=[0.798729902914], intercept=0.3027117101297481)
This my current code
model = LinearRegressionWithSGD.train(sc.parallelize(data), intercept=True)
Am I missing something? Is it because the numbers on my data are that big? It is my first time using MLlib so I might be missing some details.
Thanks
MLlib linear regression is SGD based therefore you need to tweak iterations and step size, see https://spark.apache.org/docs/latest/mllib-optimization.html.
I tried your custom data like this and I got some results (in scala):
val numIterations = 20
val model = LinearRegressionWithSGD.train(sc.parallelize(data), numIterations)
来源:https://stackoverflow.com/questions/29751110/spark-mllib-linear-regression-intercept-and-weight-nan