implementation of linear regression, values of weights increases to Inf

帅比萌擦擦* 提交于 2019-12-24 11:30:03

问题


I am implementing a program that performs linear regression on the following dataset:

http://www.rossmanchance.com/iscam2/data/housing.txt

My program is as follows:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def abline(X,theta,Y):
    yValues=calcH(X,theta)
    plt.xlim(0, 5000)
    plt.ylim(0, 2000000)
    plt.xlabel("sqft")
    plt.ylabel("price")
    plt.gca().set_aspect(0.001, adjustable='box')
    plt.plot(X,Y,'.',X, yValues, '-')
    plt.show() 

def openFile(fileR):
    f=pd.read_csv(fileR,sep="\t")
    header=f.columns.values
    prediction=f["price"]
    X=f["sqft"] 
    gradientDescent(0.0005,100,prediction,X)

def calcH(X,theta):
    h=np.dot(X,theta)
    return h

def calcC(X,Y,theta):
    d=((calcH(X,theta)-Y)**2).mean()/2
    return d


def gradientDescent(learningRate,itera, Y, X):
    t0=[]
    t1=[]
    cost=[]
    theta=np.zeros(2) 
    X=np.column_stack((np.ones(len(X)),X)) 
    for i in range(itera):
        h_theta=calcH(X,theta)
        theta0=theta[0]-learningRate*(Y-h_theta).mean()
        theta1=theta[1]-learningRate*((Y-h_theta)*X[:,1]).mean()
        theta=np.array([theta0,theta1])
        j=calcC(X,Y,theta)
        t0.append(theta0)
        t1.append(theta1)
        cost.append(j)
        if (i%10==0):
             print ("iteration ",i,"cost ",j,"theta ",theta)
             abline(X,theta,Y)

The problem that I have is that when I got my results the values of theta ends up to Inf. I have tested with only 3 iterations and some values are as follows:

iteration  0 cost  9.948977633931098e+21 theta  [-2.47365759e+04 -6.10382173e+07]
iteration  1 cost  7.094545903263138e+32 theta  [-6.46495395e+09 -1.62995849e+13]
iteration  2 cost  5.059070733255204e+43 theta  [-1.72638812e+15 -4.35260862e+18]

I would like to predict the price based on the variable sqft. I am basically following the formulas given by Andrew Ng in its Coursera ML course:

By deriving the term I got the update rule:

Update: I have added a function to plot my data and, strange, I got the following plots which are not correct:

Because it seems that my predictions are going up.

but when I plot the relationship is clearly lineal:

What am I doing wrong?

Thanks


回答1:


I replicated your results. Besides some stylistic issues and the reversing of (Y-h_theta) and (h_theta - Y) (as pointed out in one of the comments), the actual code is correct. It's just that the numbers are massive and it easily causes the results to overdo the gradient every iteration and oscillate between extremes, each time trying to "counteract" the last step with an even bigger step to the other direction. A very low learning rate could work. In real world applications, you could also normalize your data to address some of these issues.



来源:https://stackoverflow.com/questions/57740947/implementation-of-linear-regression-values-of-weights-increases-to-inf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!