Minimal example of rpy2 regression using pandas data frame

前端 未结 3 471
灰色年华
灰色年华 2020-12-08 15:57

What is the recommended way (if any) for doing linear regression using a pandas dataframe? I can do it, but my method seems very elaborate. Am I making things unnecessarily

相关标签:
3条回答
  • 2020-12-08 16:24

    After calling pandas2ri.activate() some conversions from Pandas objects to R objects happen automatically. For example, you can use

    M = R.lm('y~x', data=df)
    

    instead of

    robjects.globalenv['dataframe'] = dataframe
    M = stats.lm('y~x', data=base.as_symbol('dataframe'))
    

    import pandas as pd
    from rpy2 import robjects as ro
    from rpy2.robjects import pandas2ri
    pandas2ri.activate()
    R = ro.r
    
    df = pd.DataFrame({'x': [1,2,3,4,5], 
                       'y': [2,1,3,5,4]})
    
    M = R.lm('y~x', data=df)
    print(R.summary(M).rx2('coefficients'))
    

    yields

                Estimate Std. Error  t value  Pr(>|t|)
    (Intercept)      0.6  1.1489125 0.522233 0.6376181
    x                0.8  0.3464102 2.309401 0.1040880
    
    0 讨论(0)
  • 2020-12-08 16:40

    The R and Python are not strictly identical because you build a data frame in Python/rpy2 whereas you use vectors (without a data frame) in R.

    Otherwise, the conversion shipping with rpy2 appears to be working here:

    from rpy2.robjects import pandas2ri
    pandas2ri.activate()
    robjects.globalenv['dataframe'] = dataframe
    M = stats.lm('y~x', data=base.as_symbol('dataframe'))
    

    The result:

    >>> print(base.summary(M).rx2('coefficients'))
                Estimate Std. Error  t value  Pr(>|t|)
    (Intercept)      0.6  1.1489125 0.522233 0.6376181
    x                0.8  0.3464102 2.309401 0.1040880
    
    0 讨论(0)
  • 2020-12-08 16:40

    I can add to unutbu's answer by outlining how to retrieve particular elements of the coefficients table including, crucially, the p-values.

    def r_matrix_to_data_frame(r_matrix):
        """Convert an R matrix into a Pandas DataFrame"""
        import pandas as pd
        from rpy2.robjects import pandas2ri
        array = pandas2ri.ri2py(r_matrix)
        return pd.DataFrame(array,
                            index=r_matrix.names[0],
                            columns=r_matrix.names[1])
    
    # Let's start from unutbu's line retrieving the coefficients:
    coeffs = R.summary(M).rx2('coefficients')
    df = r_matrix_to_data_frame(coeffs)
    

    This leaves us with a DataFrame which we can access in the normal way:

    In [179]: df['Pr(>|t|)']
    Out[179]:
    (Intercept)    0.637618
    x              0.104088
    Name: Pr(>|t|), dtype: float64
    
    In [181]: df.loc['x', 'Pr(>|t|)']
    Out[181]: 0.10408803866182779
    
    0 讨论(0)
提交回复
热议问题