statsmodels linear regression - patsy formula to include all predictors in model

后端 未结 3 1203
粉色の甜心
粉色の甜心 2020-12-10 12:03

Say I have a dataframe (let\'s call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can f

相关标签:
3条回答
  • 2020-12-10 12:25

    I haven't found . equivalent in patsy documentation either. But what it lacks in conciseness, it can make-up for by giving strong string manipulation in Python. So, you can get formula involving all variable columns in DF using

    all_columns = "+".join(DF.columns - ["y"])
    

    This gives x1+x2+x3 in your case. Finally, you can create a string formula using y and pass it to any fitting procedure

    my_formula = "y~" + all_columns
    result = lm(formula=my_formula, data=DF)
    
    0 讨论(0)
  • 2020-12-10 12:33

    No this doesn't exist in patsy yet, unfortunately. See this issue.

    0 讨论(0)
  • 2020-12-10 12:37

    As this is still not included in patsy, I wrote a small function that I call when I need to run statsmodels models with all columns (optionally with exceptions)

    def ols_formula(df, dependent_var, *excluded_cols):
        '''
        Generates the R style formula for statsmodels (patsy) given
        the dataframe, dependent variable and optional excluded columns
        as strings
        '''
        df_columns = list(df.columns.values)
        df_columns.remove(dependent_var)
        for col in excluded_cols:
            df_columns.remove(col)
        return dependent_var + ' ~ ' + ' + '.join(df_columns)
    

    For example, for a dataframe called df with columns y, x1, x2, x3, running ols_formula(df, 'y', 'x3') returns 'y ~ x1 + x2'

    0 讨论(0)
提交回复
热议问题