How to add interaction term in Python sklearn

后端 未结 3 1848
余生分开走
余生分开走 2021-02-01 05:03

If I have independent variables [x1, x2, x3] If I fit linear regression in sklearn it will give me something like this:

y = a*x1 + b*x2 + c*x3 + intercept
         


        
3条回答
  •  暖寄归人
    2021-02-01 05:39

    If you do y = a*x1 + b*x2 + c*x3 + intercept in scikit-learn with linear regression, I assume you do something like that:

    # x = array with shape (n_samples, n_features)
    # y = array with shape (n_samples)
    
    from sklearn.linear_model import LinearRegression
    
    model = LinearRegression().fit(x, y)
    

    The independent variables x1, x2, x3 are the columns of feature matrix x, and the coefficients a, b, c are contained in model.coef_.

    If you want an interaction term, add it to the feature matrix:

    x = np.c_[x, x[:, 0] * x[:, 1]]
    

    Now the first three columns contain the variables, and the following column contain the interaction x1 * x2. After fitting the model you will find that model.coef_ contains four coefficients a, b, c, d.

    Note that this will always give you a model with interaction (it can theoretically be 0, though), regardless of the correlation between x1 and x2. Of course, you can measure the correlation beforehand and use it to decide which model to fit.

提交回复
热议问题