Say I have a dataframe (let\'s call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can f
As this is still not included in patsy, I wrote a small function that I call when I need to run statsmodels models with all columns (optionally with exceptions)
def ols_formula(df, dependent_var, *excluded_cols):
'''
Generates the R style formula for statsmodels (patsy) given
the dataframe, dependent variable and optional excluded columns
as strings
'''
df_columns = list(df.columns.values)
df_columns.remove(dependent_var)
for col in excluded_cols:
df_columns.remove(col)
return dependent_var + ' ~ ' + ' + '.join(df_columns)
For example, for a dataframe called df with columns y, x1, x2, x3, running ols_formula(df, 'y', 'x3') returns 'y ~ x1 + x2'