Run an OLS regression with Pandas Data Frame

前端 未结 5 1987
温柔的废话
温柔的废话 2020-11-30 16:48

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example:



        
5条回答
  •  感动是毒
    2020-11-30 17:35

    Note: pandas.stats has been removed with 0.20.0


    It's possible to do this with pandas.stats.ols:

    >>> from pandas.stats.api import ols
    >>> df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]})
    >>> res = ols(y=df['A'], x=df[['B','C']])
    >>> res
    -------------------------Summary of Regression Analysis-------------------------
    
    Formula: Y ~  +  + 
    
    Number of Observations:         5
    Number of Degrees of Freedom:   3
    
    R-squared:         0.5789
    Adj R-squared:     0.1577
    
    Rmse:             14.5108
    
    F-stat (2, 2):     1.3746, p-value:     0.4211
    
    Degrees of Freedom: model 2, resid 2
    
    -----------------------Summary of Estimated Coefficients------------------------
          Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
    --------------------------------------------------------------------------------
                 B     0.4012     0.6497       0.62     0.5999    -0.8723     1.6746
                 C     0.0004     0.0005       0.65     0.5826    -0.0007     0.0014
         intercept    14.9525    17.7643       0.84     0.4886   -19.8655    49.7705
    ---------------------------------End of Summary---------------------------------
    

    Note that you need to have statsmodels package installed, it is used internally by the pandas.stats.ols function.

提交回复
热议问题