What is the recommended way (if any) for doing linear regression using a pandas dataframe? I can do it, but my method seems very elaborate. Am I making things unnecessarily
I can add to unutbu's answer by outlining how to retrieve particular elements of the coefficients table including, crucially, the p-values.
def r_matrix_to_data_frame(r_matrix):
"""Convert an R matrix into a Pandas DataFrame"""
import pandas as pd
from rpy2.robjects import pandas2ri
array = pandas2ri.ri2py(r_matrix)
return pd.DataFrame(array,
index=r_matrix.names[0],
columns=r_matrix.names[1])
# Let's start from unutbu's line retrieving the coefficients:
coeffs = R.summary(M).rx2('coefficients')
df = r_matrix_to_data_frame(coeffs)
This leaves us with a DataFrame which we can access in the normal way:
In [179]: df['Pr(>|t|)']
Out[179]:
(Intercept) 0.637618
x 0.104088
Name: Pr(>|t|), dtype: float64
In [181]: df.loc['x', 'Pr(>|t|)']
Out[181]: 0.10408803866182779