Im new in spark and Machine learning in general. I have followed with success some of the Mllib tutorials, i can\'t get this one working:
i found the sample code her
As explained by zero323 here, setting the intercept to true will solve the problem. If not set to true, your regression line is forced to go through the origin, which is not appropriate in this case. (Not sure, why this is not included in the sample code)
So, to fix your problem, change the following line in your code (Pyspark):
model = LinearRegressionWithSGD.train(parsedData, numIterations)
to
model = LinearRegressionWithSGD.train(parsedData, numIterations, intercept=True)
Although not mentioned explicitly, this is also why the code from 'selvinsource' in the above question is working. Changing the step size doesn't help much in this example.