Python Multiple Linear Regression using OLS code with specific data?

前端 未结 3 1967
情歌与酒
情歌与酒 2020-12-15 12:40

I am using the ols.py code downloaded at scipy Cookbook (the download is in the first paragraph with the bold OLS) but I need to understand rather than using ra

相关标签:
3条回答
  • 2020-12-15 13:23

    the ols function takes the entire independent data set as the second argument. Try

    m = ols(y, [x1, x2, x3], ...)
    

    though I suspect you may need to wrap it in numpy arrays:

    x = numpy.ndarray([3, len(x1)])
    x[0] = numpy.array(x1)
    x[1] = numpy.array(x2)
    x[2] = numpy.array(x3)
    m = ols(y, x, ...)
    
    0 讨论(0)
  • 2020-12-15 13:36

    maybe using http://pypi.python.org/pypi/scikits.statsmodels is easier, and it has more features

    import numpy as np
    import scikits.statsmodels.api as sm
    
    
    y = [29.4, 29.9, 31.4, 32.8, 33.6, 34.6, 35.5, 36.3, 37.2, 37.8, 38.5, 38.8,
                38.6, 38.8, 39, 39.7, 40.6, 41.3, 42.5, 43.9, 44.9, 45.3, 45.8, 46.5,
                77.1, 48.2, 48.8, 50.5, 51, 51.3, 50.7, 50.7, 50.6, 50.7, 50.6, 50.7]
            #tuition
    x1 = [376, 407, 438, 432, 433, 479, 512, 543, 583, 635, 714, 798, 891,
                971, 1045, 1106, 1218, 1285, 1356, 1454, 1624, 1782, 1942, 2057, 2179,
                2271, 2360, 2506, 2562, 2700, 2903, 3319, 3629, 3874, 4102, 4291]
            #research and development
    x2 = [28740.00, 30952.00, 33359.00, 35671.00, 39435.00, 43338.00, 48719.00, 55379.00, 63224.00,
                72292.00, 80748.00, 89950.00, 102244.00, 114671.00, 120249.00, 126360.00, 133881.00, 141891.00,
                151993.00, 160876.00, 165350.00, 165730.00, 169207.00, 183625.00, 197346.00, 212152.00, 226402.00, 
                267298.00, 277366.00, 276022.00, 288324.00, 299201.00, 322104.00, 347048.00, 372535.00,
                397629.00]
            #one/none parents 
    x3 = [11610, 12143, 12486, 13015, 13028, 13327, 14074, 14094, 14458, 14878, 15610, 15649,
                15584, 16326, 16379, 16923, 17237, 17088, 17634, 18435, 19327, 19712, 21424, 21978,
                22684, 22597, 22735, 22217, 22214, 22655, 23098, 23602, 24013, 24003, 21593, 22319]
    
    
    x = np.column_stack((x1,x2,x3))  #stack explanatory variables into an array
    x = sm.add_constant(x, prepend=True) #add a constant
    
    res = sm.OLS(y,x).fit() #create a model and fit it
    print res.params
    print res.bse
    print res.summary()
    
    0 讨论(0)
  • 2020-12-15 13:37

    You should differentiate two cases: i) you just want to solve the equation. ii) you also want to know the statistical information about your model. You can do i) with np.linalg.lstsq; and for ii), you better use statsmodels.

    Below you find a sample example, with both solutions:

    # The standard imports
    import numpy as np
    import pandas as pd
    
    # For the statistic
    from statsmodels.formula.api import ols
    
    def generatedata():
        ''' Generate and show the data '''
        x = np.linspace(-5,5,101)
        (X,Y) = np.meshgrid(x,x)
    
        # To get reproducable values, I provide a seed value
        np.random.seed(987654321)   
    
        Z = -5 + 3*X-0.5*Y+np.random.randn(np.shape(X)[0], np.shape(X)[1])
    
        return (X.flatten(),Y.flatten(),Z.flatten())
    
    def regressionmodel(X,Y,Z):
        '''Multilinear regression model, calculating fit, P-values, confidence intervals etc.'''
    
        # Convert the data into a Pandas DataFrame
        df = pd.DataFrame({'x':X, 'y':Y, 'z':Z})
    
        # Fit the model
        model = ols("z ~ x + y", df).fit()
    
        # Print the summary
        print(model.summary())
    
        return model._results.params  # should be array([-4.99754526,  3.00250049, -0.50514907])
    
    def linearmodel(X,Y,Z):
        '''Just fit the plane'''
    
        M = np.vstack((np.ones(len(X)), X, Y)).T
        bestfit = np.linalg.lstsq(M,Z)[0]
        print('Best fit plane:', bestfit)
    
        return bestfit
    
    if __name__ == '__main__':
        (X,Y,Z) = generatedata()    
        regressionmodel(X,Y,Z)    
        linearmodel(X,Y,Z)
    
    0 讨论(0)
提交回复
热议问题