How do I calculate r-squared using Python and Numpy?

后端 未结 11 1889
春和景丽
春和景丽 2020-11-28 19:29

I\'m using Python and Numpy to calculate a best fit polynomial of arbitrary degree. I pass a list of x values, y values, and the degree of the polynomial I want to fit (lin

11条回答
  •  旧巷少年郎
    2020-11-28 20:18

    From the numpy.polyfit documentation, it is fitting linear regression. Specifically, numpy.polyfit with degree 'd' fits a linear regression with the mean function

    E(y|x) = p_d * x**d + p_{d-1} * x **(d-1) + ... + p_1 * x + p_0

    So you just need to calculate the R-squared for that fit. The wikipedia page on linear regression gives full details. You are interested in R^2 which you can calculate in a couple of ways, the easisest probably being

    SST = Sum(i=1..n) (y_i - y_bar)^2
    SSReg = Sum(i=1..n) (y_ihat - y_bar)^2
    Rsquared = SSReg/SST
    

    Where I use 'y_bar' for the mean of the y's, and 'y_ihat' to be the fit value for each point.

    I'm not terribly familiar with numpy (I usually work in R), so there is probably a tidier way to calculate your R-squared, but the following should be correct

    import numpy
    
    # Polynomial Regression
    def polyfit(x, y, degree):
        results = {}
    
        coeffs = numpy.polyfit(x, y, degree)
    
         # Polynomial Coefficients
        results['polynomial'] = coeffs.tolist()
    
        # r-squared
        p = numpy.poly1d(coeffs)
        # fit values, and mean
        yhat = p(x)                         # or [p(z) for z in x]
        ybar = numpy.sum(y)/len(y)          # or sum(y)/len(y)
        ssreg = numpy.sum((yhat-ybar)**2)   # or sum([ (yihat - ybar)**2 for yihat in yhat])
        sstot = numpy.sum((y - ybar)**2)    # or sum([ (yi - ybar)**2 for yi in y])
        results['determination'] = ssreg / sstot
    
        return results
    

提交回复
热议问题